From paul.boddie at ementor.no  Mon Aug  2 13:01:38 2004
From: paul.boddie at ementor.no (Paul Boddie)
Date: Mon Aug  2 13:01:46 2004
Subject: [Web-SIG] AMK's "Web applications (again)"
Message-ID: <FD72AF7813F1294C95279EC6D9784A2F01570ED3@100NOOSLMSG004.common.alpharoot.net>

Hello,

Having just caught up with the Daily Python-URL after being away for a
few days, I saw that there had been some commentary on writing Web
applications in Python. Has anyone given any more thought to the various
standardisation activities that were discussed on this list?

Paul

From pje at telecommunity.com  Mon Aug  2 19:23:09 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug  2 19:19:08 2004
Subject: [Web-SIG] AMK's "Web applications (again)"
In-Reply-To: <FD72AF7813F1294C95279EC6D9784A2F01570ED3@100NOOSLMSG004.co
	mmon.alpharoot.net>
Message-ID: <5.1.1.6.0.20040802131446.053f00f0@mail.telecommunity.com>

At 01:01 PM 8/2/04 +0200, Paul Boddie wrote:
>Hello,
>
>Having just caught up with the Daily Python-URL after being away for a
>few days, I saw that there had been some commentary on writing Web
>applications in Python. Has anyone given any more thought to the various
>standardisation activities that were discussed on this list?

One comment on the blog caught my eye:

"""
You know, this rant (that Python has too much vs. Java) always bugged me, 
exactly for the remark you made: who can make sense out of what's available 
*just in Jakarta* without pulling hairs? The Java developers I work with 
choose their "faith", and that's the road they travel on, and rarely track 
the other frameworks.

But: all these frameworks *deploy* in a standard fashion, and (I think) the 
frameworks can happily co-exist in the same deployment.
That's the part that I find lacking in Python: all the apps have their own 
deployment strategies, and often seem to relish in Python's ease of setting 
up micro-servers.

Posted by Roger Espinosa at July 28, 2004 06:32 AM
"""

This is what the WSGI proposal is meant to tackle.  I'm currently still 
putting off a rewrite of the proposal to address the issues raised by folks 
on this list, and to extend it slightly to better support architectures 
that want to either be asynchronous or to pipeline request preprocessors or 
response postprocessors.

From paul.boddie at ementor.no  Tue Aug  3 14:29:17 2004
From: paul.boddie at ementor.no (Paul Boddie)
Date: Tue Aug  3 14:29:21 2004
Subject: [Web-SIG] AMK's "Web applications (again)"
Message-ID: <FD72AF7813F1294C95279EC6D9784A2F01570FC9@100NOOSLMSG004.common.alpharoot.net>

Phillip J. Eby [mailto:pje@telecommunity.com] wrote:
>

[Roger Espinosa]

> > But: all these frameworks *deploy* in a standard fashion, and (I
think)
> > the frameworks can happily co-exist in the same deployment.
> > That's the part that I find lacking in Python: all the apps have
their
> > own deployment strategies, and often seem to relish in Python's ease
of
> > setting up micro-servers.
>
> This is what the WSGI proposal is meant to tackle.  I'm currently
still 
> putting off a rewrite of the proposal to address the issues raised by
> folks on this list, and to extend it slightly to better support
> architectures that want to either be asynchronous or to pipeline
request
> preprocessors or response postprocessors.

Originally, when I looked at the proposal, I interpreted it as a means
to run different server frameworks on top of a common "transport"
container - a kind of multiplexing arrangement. However, looking at the
text now, what the WSGI proposal [1] also seems to advocate (looking at
certain examples [2] for clarification) is the ability of applications
to use existing framework APIs but then to have those applications
deployed on other frameworks - to a Webware application, for example,
all frameworks look like Webware.

But then, looking deeper at the proposal, I wonder how the WSGI-defined
concepts fit in with those framework APIs. If input, output, errors and
environ have standardised semantics, it occurs to me that the semantics
of the framework-specific objects used by the applications must be
translated to the WSGI semantics as defined in the proposal. Otherwise,
it appears that these semantics get exposed to the applications
themselves, making them non-standard within the context of the framework
API they are using.

Anyway, I'm curious as to how the described interface relates to the
WebStack API in purpose and functionality. The principal objective of
WebStack is to provide applications with a common API across different
underlying server frameworks - to WebStack applications, all frameworks
appear the same (they provide the WebStack API). Perhaps there is some
overlap between the semantic translation part of WSGI and the objects
provided by WebStack. Moreover, it might also be interesting to contrast
the concept of a WebStack framework adapter with parts of the WSGI
proposal.

On the subject of deployment, however, the Java-style standardised
deployment with things like .war files and descriptors doesn't
necessarily deliver everything that the hype would suggest, as anyone
who has had to work with more than one application server would know.
Moreover, the more Web applications appear like normal Python packages
and programs, the easier they are likely to be to deploy, especially if
the means of deployment doesn't involve uploading some archive file to
some Web application and clicking a "restart" button - something that
turned many people away from Zope, I'd wager. In certain circumstances,
the lightweight "micro-servers" really do have their advantages...

Paul

[1] http://mail.python.org/pipermail/web-sig/2003-December/000394.html
[2] http://mail.python.org/pipermail/web-sig/2003-December/000417.html

From pje at telecommunity.com  Tue Aug  3 17:12:51 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Aug  3 17:08:50 2004
Subject: [Web-SIG] AMK's "Web applications (again)"
In-Reply-To: <FD72AF7813F1294C95279EC6D9784A2F01570FC9@100NOOSLMSG004.co
	mmon.alpharoot.net>
Message-ID: <5.1.1.6.0.20040803110315.01ea4ec0@mail.telecommunity.com>

At 02:29 PM 8/3/04 +0200, Paul Boddie wrote:
>Phillip J. Eby [mailto:pje@telecommunity.com] wrote:
> >
>
>[Roger Espinosa]
>
> > > But: all these frameworks *deploy* in a standard fashion, and (I
>think)
> > > the frameworks can happily co-exist in the same deployment.
> > > That's the part that I find lacking in Python: all the apps have
>their
> > > own deployment strategies, and often seem to relish in Python's ease
>of
> > > setting up micro-servers.
> >
> > This is what the WSGI proposal is meant to tackle.  I'm currently
>still
> > putting off a rewrite of the proposal to address the issues raised by
> > folks on this list, and to extend it slightly to better support
> > architectures that want to either be asynchronous or to pipeline
>request
> > preprocessors or response postprocessors.
>
>Originally, when I looked at the proposal, I interpreted it as a means
>to run different server frameworks on top of a common "transport"
>container - a kind of multiplexing arrangement. However, looking at the
>text now, what the WSGI proposal [1] also seems to advocate (looking at
>certain examples [2] for clarification) is the ability of applications
>to use existing framework APIs but then to have those applications
>deployed on other frameworks - to a Webware application, for example,
>all frameworks look like Webware.

Right, that's not a bad way of putting it.  When I redo the PEP for WSGI, 
the terminology will hopefully be clearer.  In fact, "WSGI" now stands for 
"Web Server Gateway Interface", so it might be more correct to say that "a 
Webware application could run on any web server that supports WSGI."


>But then, looking deeper at the proposal, I wonder how the WSGI-defined
>concepts fit in with those framework APIs. If input, output, errors and
>environ have standardised semantics, it occurs to me that the semantics
>of the framework-specific objects used by the applications must be
>translated to the WSGI semantics as defined in the proposal.

Absolutely.  However, those are by-and-large the semantics of HTTP and CGI, 
which form the basis for most existing web servers and gateway 
protocols.  Any framework that supports being run under CGI (or FastCGI, or 
any of the FastCGI clones) is relatively simple to adapt to WSGI.


>Anyway, I'm curious as to how the described interface relates to the
>WebStack API in purpose and functionality. The principal objective of
>WebStack is to provide applications with a common API across different
>underlying server frameworks - to WebStack applications, all frameworks
>appear the same (they provide the WebStack API). Perhaps there is some
>overlap between the semantic translation part of WSGI and the objects
>provided by WebStack. Moreover, it might also be interesting to contrast
>the concept of a WebStack framework adapter with parts of the WSGI
>proposal.

I took a brief look at WebStack yesterday; my impression is that under 
WSGI, your framework adapters would be unnecessary, because you'd just have 
one for WSGI.  From WSGI's point of view, WebStack is just another Python 
web framework.


>On the subject of deployment, however, the Java-style standardised
>deployment with things like .war files and descriptors doesn't
>necessarily deliver everything that the hype would suggest, as anyone
>who has had to work with more than one application server would know.

Well, WSGI isn't trying to standardize to *that* level as yet.  :)  That 
would be a different PEP at some later stage after there's some field 
experience with the interface.


>Moreover, the more Web applications appear like normal Python packages
>and programs, the easier they are likely to be to deploy, especially if
>the means of deployment doesn't involve uploading some archive file to
>some Web application and clicking a "restart" button - something that
>turned many people away from Zope, I'd wager. In certain circumstances,
>the lightweight "micro-servers" really do have their advantages...

Well, all of the frameworks are free to "innovate" as much as they like in 
this respect.  Different systems have different user 
audiences.  Personally, I wouldn't mind someday seeing a standardized .zip 
format for deploying applications to a web server... as long as you can 
build it with the distutils!

From pje at telecommunity.com  Thu Aug  5 18:19:50 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Aug  5 18:15:45 2004
Subject: [Web-SIG] Asynchronous streaming in WSGI
Message-ID: <5.1.1.6.0.20040805120355.01ed23e0@mail.telecommunity.com>

I've been looking at a possible change to the WSGI protocol to address some 
issues raised by Grisha and Ian.  But I'm not sure that the change is best, 
given the range of existing platforms and applications that may *currently* 
use asynchronous streaming of responses, even though in many ways the 
change would handle asynchronous streaming *better*.

Let me explain.  The previous WSGI proposal was based on an interface like:

     def runCGI(inp,out,err,env):
         # do everything

The modified interface, that I've been playing with in peak.web is:

     def handle_http(env):
         return status_string,header_list,output_iterable

The ideas that changed here are:

* Separate status from headers and output
* Don't require servers to parse headers or create an output buffer
* Allow lengthy output to be streamed *after* the function returns, to 
avoid tying up a task thread in multi-threaded servers
* Allow non-CGI variables (e.g. 'wsgi.input_stream', 'wsgi.error_stream', 
'wsgi.version', 'wsgi.multi_threaded', etc.) in the environment to avoid a 
separate configuration method and simplify chaining of processors

As a result of these changes, it should also be much easier to write 
request preprocessors, response postprocessors, and other kinds of 
intermediaries between the web server and the actual 
application/frameworks, because less parsing and buffering are 
required.  Last, but not least, an interface like this should be easier to 
implement in asynchronous web servers, because they can just invoke 
'iterator.next()' when they need another block to send out.

I think these are improvements in the direction that folks requested, 
*except* for one issue: unbuffered streaming output in existing code can't 
use this.  A prime example is Zope, whose response.write() method does 
streaming output.  Under the revised WSGI, there's nothing to write *to*, 
so such existing code would have to run in a separate thread from the web 
server and communicate via a queue.  This doesn't seem like a great idea.

So, there are several possible ways to deal with this:

1) Stick with the old interface
2) Go with the newer interface, and try to lobby frameworks that support 
this type of "push" to make changes to support it
3) Publish both interfaces, and push for a stdlib module that can convert 
between them
4) Some other idea I haven't thought of  :)

Opinions?  Questions?  Ideas?


From pje at telecommunity.com  Mon Aug  9 01:59:14 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug  9 02:12:15 2004
Subject: [Web-SIG] The rewritten WSGI pre-PEP
Message-ID: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>

This version is an almost complete rewrite, based on a new interface 
approach developed by Tony Lownds and I.  As you'll see, it tries to 
address as much of the list's feedback as I could absorb and remember.  So, 
please be patient with me if I missed taking something into account.

As always, your comments and feedback are appreciated.


PEP: XXX
Title: Python Web Server Gateway Interface v1.0
Version: $Revision: 1.1 $
Last-Modified: $Date: 2004/08/08 19:48:42 $
Author: Phillip J. Eby <pje@telecommunity.com>
Discussions-To: Python Web-SIG <web-sig@python.org>
Status: Draft
Type: Informational
Content-Type: text/x-rst
Created: 07-Dec-2003
Post-History: 07-Dec-2003, 08-Aug-2004


Abstract
========

This document specifies a proposed standard interface between web
servers and Python web applications or frameworks, to promote
web application portability across a variety of web servers.


Rationale
=========

Python currently boasts a wide variety of web application
frameworks, such as Zope, Quixote, Webware, Skunkware, PSO,
and Twisted Web -- to name just a few [1]_.  This wide variety
of choices can be a problem for new Python users, because
generally speaking, their choice of web framework will limit
their choice of usable web servers, and vice versa.

By contrast, although Java has just as many web application
frameworks available, Java's "servlet" API makes it possible
for applications written with any Java web application framework
to run in any web server that supports the servlet API.

The availability and widespread use of such an API in web
servers for Python -- whether those servers are written in
Python (e.g. Medusa), embed Python (e.g. mod_python), or
invoke Python via a gateway protocol (e.g. CGI, FastCGI,
etc.) -- would separate choice of framework from choice
of web server, freeing users to choose a pairing that suits
them, while freeing framework and server developers to focus
on their area of specialty.

This PEP, therefore, proposes a simple and universal interface
between web servers and web applications or frameworks: the
Python Web Server Gateway Interface (WSGI).

But the mere existence of a WSGI spec does nothing to address the
existing state of servers and frameworks for Python web applications.
Server and framework authors and maintainers must actually implement
WSGI for there to be any effect.

However, since no existing servers or frameworks support WSGI, there
is little immediate reward for an author who implements WSGI support.
Thus, WSGI *must* be easy to implement, so that an author's initial
investment in the interface can be reasonably low.

Thus, simplicity of implementation on *both* the server and framework
sides of the interface is absolutely critical to the utility of the
WSGI interface, and is therefore the principal criterion for any
design decisions.  (It should also be easy to create request
preprocessors, response postprocessors, and other "middleware"
components that look like an application to their containing server,
while acting as a server for their contained applications.)

Note, however, that simplicity of implementation for a framework
author is not the same thing as ease of use for a web application
author.  WSGI presents an absolutely "no frills" interface to the
framework author, because bells and whistles like response objects
and cookie handling would just get in the way of existing frameworks'
handling of these issues.  Again, the goal of WSGI is to facilitate
easy interconnection of existing servers and applications or
frameworks, not to create a new web framework.

Note also that this goal precludes WSGI from requiring anything that
is not already available in deployed versions of Python.  Therefore,
new standard library modules are not proposed or required by this
specification, and nothing in WSGI requires a Python version greater
than 1.5.2.  (It would be a good idea, however, for future versions
of Python to include support for this interface in web servers
provided by the standard library.)

Finally, the current version of WSGI does not prescribe any
particular mechanism for "deploying" an application for use with a
web server or server gateway.  At the present time, this is
necessarily implementation-defined by the server or gateway.
After a sufficient number of servers and frameworks have implemented
WSGI to provide field experience with varying deployment
requirements, it may make sense to create another PEP, describing
a deployment standard for WSGI servers and application frameworks.


Specification Overview
======================

The WSGI interface has two sides: the "server" or "gateway" side,
and the "application" side.  The server side invokes a callable
object that is provided by the application side.  The specifics
of how that object is provided are up to the server or gateway.
It is assumed that some servers or gateways will require an
application's deployer to write a short script to create an
instance of the server or gateway, and supply it with the
application object.  Other servers and gateways may use
configuration files or other mechanisms to specify where the
application object should be imported from.

The application object is simply a callable object that accepts
two arguments.  The term "object" should not be misconstrued as
requiring an actual object instance: a function, method, class,
or instance with a ``__call__`` method are all acceptable for
use as an application object.  Here are two example application
objects; one is a function, and the other is a class::

     def simple_app(environ, start_response):
         """Simplest possible application object"""
         status = '200 OK'
         headers = [('Content-type','text/plain')]
         write = start_response(status, headers)
         write('Hello world!\n')


     class AppClass:
         """Much the same thing, but as a class"""

         def __init__(environ, start_response):
             self.environ = environ
             self.start = start_response

         def __iter__(self):
             status = '200 OK'
             headers = [('Content-type','text/plain')]
             self.start(status, headers)

             yield "Hello world!\n"
             for i in range(1,11):
                 yield "Extra line %s\n" % i


The server or gateway invokes the application once for each request
it receives from a web browser.  To illustrate, here is a simple
CGI gateway, implemented as a function taking an application object
(all error handling omitted)::

     import os, sys

     def run_with_cgi(application):

         environ = {}
         envrion.update(os.environ)
         environ['wsgi.input']        = sys.stdin
         environ['wsgi.errors']       = sys.stderr
         environ['wsgi.version']      = '1.0'
         environ['wsgi.multithread']  = False
         environ['wsgi.multiprocess'] = True

         def start_response(status,headers):
             print "Status:", status
             for key,val in headers:
                 print "%s: %s" % (key,val)
             return sys.stdout.write

         result = application(environ, start_response)
         if result:
             try:
                 for data in result:
                     sys.stdout.write(data)
             finally:
                 if hasattr(result,'close'):
                     result.close()

In the next section, we will specify the precise semantics that
these illustrations are examples of.


Specification Details
=====================

The application object must accept two positional arguments.  For
the sake of illustration, we have named them ``environ``, and
``start_response``, but they are not required to have these names.
A server or gateway *must* invoke the application object using
positional (not keyword) arguments.

The first parameter is a dictionary object, containing CGI-style
environment variables.  This object *must* be a builtin Python
dictionary (*not* a subclass, ``UserDict`` or other dictionary
emulation), and the application is allowed to modify the dictionary
in any way it desires.  The dictionary must also include certain
WSGI-required variables (described in a later section), and may
also include server-specific extension variables, named according
to a convention that will be described below.

The second parameter is a callable accepting two positional
arguments: a status string of the form ``"999 Message here"``,
and a list of ``(header_name,header_value)`` tuples describing the
HTTP response header.  This callable must return another callable
that takes one parameter: a string to write as part of the HTTP
response body.

The application object may return either ``None`` (indicating that
there is no additional output), or it may return a non-empty
iterable yielding strings.  (For example, it could be a
generator-iterator that yields strings, or it could be a
sequence such as a list of strings.)  If the application
returns an iterable, and the iterable has a ``close()`` method,
the server or gateway *must* call that method upon completion
of the current request, whether the request was completed normally,
or terminated early due to an error.  (This is to support resource
release by the application.  The specific protocol is intended to
support PEP 325, and also the simple case of an application returning
an open text file.)


``environ`` Variables
---------------------

The ``environ`` dictionary is required to contain CGI environment
variables, as defined by the Common Gateway Interface specification
[2]_.  In addition, it must contain the following WSGI-defined
variables:

====================   =============================================
Variable               Value
====================   =============================================
``wsgi.version``       The string ``"1.0"``

``wsgi.input``         An input stream from which the HTTP request
                        body can be read.

``wsgi.errors``        An output stream to which error output can
                        be written.  For most servers, this will be
                        the server's error log.

``wsgi.multithread``   This value should be true if the application
                        object may be simultaneously invoked by
                        another thread in the same process, and
                        false otherwise.

``wsgi.multiprocess``  This value should be true if an equivalent
                        application object may be simultaneously
                        invoked by another process, and false
                        otherwise.
====================   =============================================

Finally, the ``environ`` dictionary may also contain server-defined
variables.  These variables should be named using only lower-case
letters, numbers, dots, and underscores, and should be prefixed with
a name that is unique to the defining server or gateway.  For
example, ``mod_python`` might define variables with names like
``mod_python.some_variable``.

Note: missing variables (such as ``REMOTE_USER`` when no
authentication has occurred) should be left out of the ``environ``
dictionary.  Also note that CGI-defined variables must be strings,
if they are present at all.  It is a violation of this specification
for a CGI variable's value to be of any type other than ``str``.


Input and Error Streams
~~~~~~~~~~~~~~~~~~~~~~~

The input and error streams provided by the server must support
the following methods:

===================  =========  ========
Method               Files      Notes
===================  =========  ========
``read(size)``       ``input``
``readline()``       ``input``   1
``readlines(hint)``  ``input``   2
``__iter__()``       ``input``
``flush()``          ``errors``  3
``write(str)``       ``errors``
``writelines(seq)``  ``errors``
===================  ==========  ========

The semantics of each method are as documented in the Python Library
Reference, except for these notes as listed in the table above:

1. The optional "size" argument to ``readline()`` is not supported,
    as it may be complex for server authors to implement, and is not
    often used in practice.

2. Note that the ``hint`` argument to ``readlines()`` is optional for
    both caller and implementer.  The application is free not to
    supply it, and the server or gateway is free to ignore it.

3. Since the ``errors`` stream may not be rewound, a container is
    free to forward write operations immediately, without buffering.
    In this case, the ``flush()`` method may be a no-op.  Portable
    applications, however, cannot assume that output is unbuffered
    or that ``flush()`` is a no-op.  They must call ``flush()`` if
    they need to ensure that output has in fact been written.  (For
    example, to minimize intermingling of data from multiple processes
    writing to the same error log.

The methods listed in the table above *must* be supported by all
servers conforming to this specification.  Applications conforming
to this specification *must not* use any other methods or attributes
of the ``input`` or ``errors`` objects.  In particular, applications
*must not* attempt to close these streams, even if they possess
``close()`` methods.


The ``start_response()`` Callable
---------------------------------

The second parameter passed to the application object is itself a
two-argument callable, used to begin the HTTP response and return
a ``write()`` function.  The first parameter it takes is a "status"
string, of the form ``"999 Message here"``, where ``999`` is replaced
with the HTTP status code, and ``Message here`` is replaced with the
appropriate message text.  The string *must* be pure 7-bit ASCII,
containing no control characters.  In particular, it must not be
terminated with a carriage return or linefeed.

The second parameter accepted by the ``start_response()`` callable
must be a sequence of ``(header_name,header_value)`` tuples.  Each
``header_name`` must be a valid HTTP header name, without a
trailing colon or other punctuation.  Each ``header_value``
*must not* include a trailing carriage return or linefeed: it
should be a raw header value.  (These requirements are to minimize
the complexity of parsing required by servers, gateways, and
intermediate response processors that need to inspect or modify
response headers.)

The return value of the ``start_response()`` callable is a
one-argument callable, that accepts strings to write as part of the
HTTP response body.


Implementation/Application Notes
================================


Unicode
-------

HTTP does not directly support Unicode, and neither does this
interface.  All encoding/decoding must be handled by the application;
all strings and streams passed to or from the server must be standard
Python byte strings, not Unicode objects.  The result of using a
Unicode object where a string object is required, is undefined.


Multiple Invocations
--------------------

Application objects must be able to be invoked more than once, since
virtually all servers/gateways will make such requests.


Error Handling
--------------

Servers *should* trap and log exceptions raised by
applications, and *may* continue to execute, or attempt to shut down
gracefully.  Applications *should* avoid allowing exceptions to
escape their execution scope, since the result of uncaught exceptions
is server-defined.


Thread Support
--------------

Thread support, or lack thereof, is also server-dependent.
Servers that can run multiple requests in parallel, *should* also
provide the option of running an application in a single-threaded
fashion, so that applications or frameworks that are not thread-safe
may still be used with that server.


Application Configuration
-------------------------

This specification does not define how a server selects or
obtains an application to invoke.  These and other configuration
options are highly server-specific matters.  It is expected that
server/gateway authors will document how to configure the server to
execute a particular application object, and with what options (such
as threading options).

Framework authors, on the other hand, should document how to create
an application object that wraps their framework's functionality.
The user, who has chosen both the server and the application
framework, must connect the two together.  However, since both the
framework and the server now have a common interface, this should
be merely a mechanical matter, rather than a significant engineering
effort for each new server/framework pair.


Middleware
----------

Note that a single object may play the role of a server with respect
to some application(s), while also acting as an application with
respect to some server(s).  Such "middleware" components can perform
such functions as:

   * Routing a request to different application objects based on the
     target URL, after rewriting the ``environ`` accordingly.

   * Allowing multiple applications or frameworks to run side-by-side
     in the same process

   * Load balancing and remote processing, by forwarding requests and
     responses over a network

   * Perform content postprocessing, such as applying XSL stylesheets

Given the existence of applications and servers conforming to this
specification, the appearance of such reusable middleware becomes
a possibility.


Questions and Answers
=====================

1. Why must ``environ`` be a dictionary?  What's wrong with using
    a subclass?

    The rationale for requiring a dictionary is to maximize
    portability between servers.  The alternative would be to define
    some subset of a dictionary's methods as being the standard and
    portable interface.  In practice, however, most servers will
    probably find a dictionary adequate to their needs, and thus
    framework authors will come to expect the full set of dictionary
    features to be available, since they will be there more often
    than not.  But, if some server chooses *not* to use a dictionary,
    then there will be interoperability problems despite that
    server's "conformance" to spec.  Therefore, making a dictionary
    mandatory simplifies the specification and guarantees
    interoperabilty.

    Note that this does not prevent server or framework developers
    from offering specialized services as custom variables *inside*
    the ``environ`` dictionary.  This is the recommended approach
    for offering any such value-added services.

2. Why can you call ``write()`` *and* yield strings/return an
    iterator?  Shouldn't we pick just one way?

    If we supported only the iteration approach, then current
    frameworks that assume the availability of "push" suffer.
    But, if we only support pushing via ``write()``, then
    server performance suffers for transmission of e.g. large
    files (if a worker thread can't start on a new request
    until all of the output has been sent).  Thus, this compromise
    allows an application framework to support both approaches, as
    appropriate, but with only a little more burden to the server
    implementor than a push-only approach would require.

3. What's the ``close()`` for?

    When writes are done from during the execution of an application
    object, the application can ensure that resources are released
    using a try/finally block.  But, if the application returns an
    iterator, any resources used will not be released until the
    iterator is garbage collected.  The ``close()`` idiom allows
    an application to release critical resources at the end of a
    request, and it's forward-compatible with the support for
    try/finally in generators that's proposed by PEP 325.

4. Why is this interface so low-level?  I want feature X!  (e.g.
    cookies, sessions, persistence, ...)

    This isn't Yet Another Python Web Framework.  It's just a way
    for frameworks to talk to web servers, and vice versa.  If you
    want these features, you need to pick a web framework that
    provides the features you want.  And if that framework lets
    you create a WSGI application, you should be able to run it
    in most WSGI-supporting servers.  Also, some WSGI servers may
    offer additional services via objects provided in their
    ``environ`` dictionary; see the applicable server documentation
    for details.  (Of course, applications that use such extensions
    will not be portable to other WSGI-based servers.)


Acknowledgements
================

Thanks go to the many folks on the Web-SIG mailing list whose
thoughtful feedback made this revised draft possible.  Especially:

  * Gregory "Grisha" Trubetskoy, author of ``mod_python``, who
    beat up on the first draft as not offering any advantages
    over "plain old CGI", thus encouraging me to look for a
    better approach.

  * Ian Bicking, who helped nag me into properly specifying
    the multithreading and multiprocess options, as well as
    badgering me to provide a mechanism for servers to supply
    custom extension data to an application.

  * Tony Lownds, who came up with the concept of a ``start_response``
    function that took the status and headers, returning a ``write``
    function.


References
==========

.. [1] The Python Wiki "Web Programming" topic
    (http://www.python.org/cgi-bin/moinmoin/WebProgramming)

.. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft
    (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt)


Copyright
=========

This document has been placed in the public domain.


..
    Local Variables:
    mode: indented-text
    indent-tabs-mode: nil
    sentence-end-double-space: t
    fill-column: 70
    End:

From smulloni at smullyan.org  Mon Aug  9 04:03:23 2004
From: smulloni at smullyan.org (Jacob Smullyan)
Date: Mon Aug  9 04:03:25 2004
Subject: [Web-SIG] The rewritten WSGI pre-PEP
In-Reply-To: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
Message-ID: <20040809020323.GA21842@smullyan.org>

On Sun, Aug 08, 2004 at 07:59:14PM -0400, Phillip J. Eby wrote:
> Python currently boasts a wide variety of web application
> frameworks, such as Zope, Quixote, Webware, Skunkware, PSO,
> and Twisted Web -- to name just a few [1]_. 

If you must continue to call SkunkWeb SkunkWare, please be consistent
and call Webware Wareweb.

Cheers,

Jacob Smullyan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mail.python.org/pipermail/web-sig/attachments/20040808/ac457e7e/attachment.pgp
From pje at telecommunity.com  Mon Aug  9 06:00:30 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug  9 05:56:17 2004
Subject: [Web-SIG] The rewritten WSGI pre-PEP
In-Reply-To: <20040809020323.GA21842@smullyan.org>
References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
	<5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040808235422.029de490@mail.telecommunity.com>

At 10:03 PM 8/8/04 -0400, Jacob Smullyan wrote:
>On Sun, Aug 08, 2004 at 07:59:14PM -0400, Phillip J. Eby wrote:
> > Python currently boasts a wide variety of web application
> > frameworks, such as Zope, Quixote, Webware, Skunkware, PSO,
> > and Twisted Web -- to name just a few [1]_.
>
>If you must continue to call SkunkWeb SkunkWare, please be consistent
>and call Webware Wareweb.
>
>Cheers,
>
>Jacob Smullyan

Crap.  Sorry about that.  Changing it in my file copy now.  (I'm surprised 
you didn't mention it back in December, though, as that error was in the 
first draft, too.)

On the bright side: out of dozens of frameworks, at least yours got 
mentioned.  ;)  Maybe I shouldn't have named any, since yours was the 
second framework I got the name wrong on.

From ianb at colorstudy.com  Wed Aug 11 08:42:22 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Aug 11 08:42:27 2004
Subject: [Web-SIG] The rewritten WSGI pre-PEP
In-Reply-To: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
Message-ID: <4119BFCE.4080207@colorstudy.com>

It looks great to me.  Of course, I got all my wishes.  A couple smaller 
things, and some possible clarifications:

Phillip J. Eby wrote:
> Specification Overview
> ======================
> 
> The WSGI interface has two sides: the "server" or "gateway" side,
> and the "application" side.  The server side invokes a callable
> object that is provided by the application side.  The specifics
> of how that object is provided are up to the server or gateway.
> It is assumed that some servers or gateways will require an
> application's deployer to write a short script to create an
> instance of the server or gateway, and supply it with the
> application object.  Other servers and gateways may use
> configuration files or other mechanisms to specify where the
> application object should be imported from.
> 
> The application object is simply a callable object that accepts
> two arguments.  The term "object" should not be misconstrued as
> requiring an actual object instance: a function, method, class,
> or instance with a ``__call__`` method are all acceptable for
> use as an application object.  Here are two example application
> objects; one is a function, and the other is a class::
> 
>     def simple_app(environ, start_response):
>         """Simplest possible application object"""
>         status = '200 OK'
>         headers = [('Content-type','text/plain')]
>         write = start_response(status, headers)
>         write('Hello world!\n')

The callables are a little confusing to me.  The application is a 
callable.  Start_response is a callable.  It returns a callable.  Of 
course, if it wasn't a callable, it would be an object with only one 
method, which is kind of boring.

A contrary example to this would be iterators, which have basically one 
method in their interface (next); yet they are not simply callables.

I'm not of strong opinion, but the callables definitely make it harder 
to understand.

> ``environ`` Variables
> ---------------------
> 
> The ``environ`` dictionary is required to contain CGI environment
> variables, as defined by the Common Gateway Interface specification
> [2]_.  In addition, it must contain the following WSGI-defined
> variables:
> 
> ====================   =============================================
> Variable               Value
> ====================   =============================================
> ``wsgi.version``       The string ``"1.0"``

Would it make sense for this to be a tuple, like (1, 0), like 
sys.version_info?

> ``wsgi.input``         An input stream from which the HTTP request
>                        body can be read.
> 
> ``wsgi.errors``        An output stream to which error output can
>                        be written.  For most servers, this will be
>                        the server's error log.
> 
> ``wsgi.multithread``   This value should be true if the application
>                        object may be simultaneously invoked by
>                        another thread in the same process, and
>                        false otherwise.
> 
> ``wsgi.multiprocess``  This value should be true if an equivalent
>                        application object may be simultaneously
>                        invoked by another process, and false
>                        otherwise.
> ====================   =============================================

Another useful one I brought up last time would be some indication that 
the application was definitely not going to be reused, i.e., it's being 
invoked in a CGI context.  The performance issues there are completely 
different than in other environments.

Webware has a CGI interface, but it suffers from being really slow.  It 
could be faster, but everything is optimized toward the long-running 
case.  I think CGI could be made to perform better, putting in 
information to know when to do those optimizations would leave that door 
open.

Another common use case would be sessions.  It's best to preserve 
sessions over server restarts, but you might keep sessions in memory and 
only write to disk when the server shuts down.  If it's a CGI request, 
you can skip all that and just write to disk immediately.


> .. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft
>    (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt)

I think before we discussed being explicit about a couple variables. 
Specifically that SCRIPT_NAME should refer to the application's root, 
and PATH_INFO to everything that comes after.  This is in contrast to a 
situation where SCRIPT_NAME points to the WSGI server, and PATH_INFO to 
the application (in a case where the server hosts multiple applications 
at different URLs).  Your CGI example avoids this issue because it only 
supports one application, but a naive extension of that example to 
support more applications might improperly set these variables.

Should there be any policy about path segments containing //, ./, or ../?

Hmm... what should the server do if it gets a Location header with no 
Status?  I think Apache does an internal redirect, sometimes.  Should 
there be any notion of an internal redirect?  The CGI spec seems to 
require internal redirects in this case.

The CGI spec says servers should change the current working directory to 
the resource being run.  I think this won't be that common for WSGI 
servers, though.

I wonder if this will be an issue with imports.  Specifically, relative 
imports.  Eh, I guess that's an application issue.

Will GATEWAY_INTERFACE be defined?  If so, what value?  "WSGI/1.0"?  I 
assume SERVER_SOFTWARE will be up to the WSGI server.  Should they be 
sure to rewrite this value if these servers are nested?  E.g., should 
your CGI example rewrite that value?  It seems like each piece adds 
another name to the end in the format "name/version_number", where the 
name has no spaces.  And it might optionally have more information in 
parenthesis after the version, which may contain spaces.  Maybe this 
should be a suggestion.

Is there any non-parsed header form?  This would be difficult to support 
in some environments.  Easy in BasicHTTPServer, but hard with a CGI server.

This is from the CGI spec:

    Scripts MUST be prepared to handled URL-encoded values in
    metavariables. In addition, they MUST recognise both "+" and
    "%20" in URL-encoded quantities as representing the space
    character. (See section 3.1.)

That seems weird; I've never URL-decoded values besides QUERY_STRING.

The CGI spec doesn't seem to mention REQUEST_URI.  That's surprising. 
Here's the Apache CGI variables it doesn't mention:

SERVER_SIGNATURE (pretty boring)
SERVER_ADDR (seems very basic)
DOCUMENT_ROOT (doesn't seem appropriate)
SCRIPT_FILENAME (also often not appropriate)
SERVER_ADMIN (boring)
SCRIPT_URI
REQUEST_URI (I don't understand the distinction)
REMOTE_PORT (boring, though I guess if you wanted to add an ident check 
it would be useful)
UNIQUE_ID (not needed)


I think SERVER_ADDR and REMOTE_PORT are easy to add, and potentially 
useful.  SCRIPT_URI and REQUEST_URI might be good.


For middleware application/servers, it might be suggested that they use 
mod_rewrites extra variables 
(http://httpd.apache.org/docs/mod/mod_rewrite.html#EnvVar):

This module keeps track of two additional (non-standard) CGI/SSI 
environment variables named SCRIPT_URL  and SCRIPT_URI. These contain 
the logical Web-view to the current resource, while the standard CGI/SSI 
variables SCRIPT_NAME and SCRIPT_FILENAME contain the physical  System-view.

Notice: These variables hold the URI/URL as they were initially 
requested, i.e., before any rewriting. This is important because the 
rewriting process is primarily used to rewrite logical URLs to physical 
pathnames.

Example:

SCRIPT_NAME=/sw/lib/w3s/tree/global/u/rse/.www/index.html
SCRIPT_FILENAME=/u/rse/.www/index.html
SCRIPT_URL=/u/rse/
SCRIPT_URI=http://en1.engelschall.com/u/rse/


From fredrik at pythonware.com  Wed Aug 11 13:04:50 2004
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed Aug 11 13:20:38 2004
Subject: [Web-SIG] Re: The rewritten WSGI pre-PEP
References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
Message-ID: <cfcugi$9hi$1@sea.gmane.org>

Phillip J. Eby wrote:

> As always, your comments and feedback are appreciated.
>      def run_with_cgi(application):
>
>          environ = {}
>          envrion.update(os.environ)

NameError

>          environ['wsgi.input']        = sys.stdin
>          environ['wsgi.errors']       = sys.stderr
>          environ['wsgi.version']      = '1.0'
>          environ['wsgi.multithread']  = False
>          environ['wsgi.multiprocess'] = True

The answer's probably hidden somewhere in the mailing list archives, but why
do you mix WSGI variables with external CGI environment variables?

I'd prefer

     def application(context, environ, start_response)

where context is an object of a server-defined type, with attributes for
input, errors, etc:

            context = MyApplicationServerContext()
            context.input = sys.stdin
            context.errors = sys.stderr
            context.version = "1.0" (or (1, 0))
            etc

Advantages:
- contexts can (probably) be reused
- attributes can be lazily initialized (via properties or getattr hooks)
- the user code looks nicer
- future safe: more attributes and methods can be added to the context
  object in future revisions of this specification, without changing the
  function signatures

Disadvantages:
- one more argument; but if that's really a problem, why not make
  start_response a method of the context class?

        def application(context, environ):
            ...
            context.start(status, headers)


> The second parameter passed to the application object is itself a
> two-argument callable, used to begin the HTTP response and return
> a ``write()`` function.  The first parameter it takes is a "status"
> string, of the form ``"999 Message here"``, where ``999`` is replaced
> with the HTTP status code, and ``Message here`` is replaced with the
> appropriate message text.

To make life easier for users, you might wish to accept either an integer
status code (e.g. start(200, headers)) or a string.  In case a status code
is provided, the server can fill in a suitable string value (as per the HTTP
specification).

Except for those small nits, I'm totally +1 on this proposal.

</F>


From ianb at colorstudy.com  Wed Aug 11 17:54:36 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Aug 11 17:55:29 2004
Subject: [Web-SIG] Re: The rewritten WSGI pre-PEP
In-Reply-To: <cfcugi$9hi$1@sea.gmane.org>
References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
	<cfcugi$9hi$1@sea.gmane.org>
Message-ID: <411A413C.2050803@colorstudy.com>

Fredrik Lundh wrote:
> Disadvantages:
> - one more argument; but if that's really a problem, why not make
>   start_response a method of the context class?
> 
>         def application(context, environ):
>             ...
>             context.start(status, headers)

This would solve the too-many-callables problem as well.

However, because the context could have a complex implementation, it 
would be hard to rewrite the context if you forward the request.  OTOH, 
most of the pieces of the context shouldn't be forwarded on.  For 
instance, if mod_python gives access to the apache module, or the 
original request object, should middleware pass through that access?  It 
would probably be incorrect, as the middleware is doing some filtering 
and the mod_python extensions would bypass that filtering.

Which is to say, middleware shouldn't pass through extensions by 
default, but with a dictionionary implementation it would be common to 
do so.

One positive aspect of a dictionary is that introspection is easier. 
There's no reliable equivalent of .keys() for an arbitrary object.

And, if we package things into an object, environ could also become an 
attribute of context.


-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From pje at telecommunity.com  Wed Aug 11 18:44:18 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Aug 11 18:44:23 2004
Subject: [Web-SIG] The rewritten WSGI pre-PEP
In-Reply-To: <4119BFCE.4080207@colorstudy.com>
References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
	<5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040811122521.01ea8c60@mail.telecommunity.com>

At 01:42 AM 8/11/04 -0500, Ian Bicking wrote:

>The callables are a little confusing to me.  The application is a 
>callable.  Start_response is a callable.  It returns a callable.  Of 
>course, if it wasn't a callable, it would be an object with only one 
>method, which is kind of boring.
>
>A contrary example to this would be iterators, which have basically one 
>method in their interface (next); yet they are not simply callables.

It's assumed that iterators may have other behaviors.  In any case, I 
certainly made use of iterators and methods where appropriate, i.e. in the 
return value of the application, which can support __iter__(), next(), and 
close() if they are needed.


>I'm not of strong opinion, but the callables definitely make it harder to 
>understand.

...but easier to implement, since everything can be done with functions and 
closures.

Do you think you would have difficulty creating a conforming 
implementation, or are you just saying it took you a while to grasp how you 
would do so?


>>====================   =============================================
>>Variable               Value
>>====================   =============================================
>>``wsgi.version``       The string ``"1.0"``
>
>Would it make sense for this to be a tuple, like (1, 0), like 
>sys.version_info?

Maybe.  I'm not sure it makes any difference.  I could just as soon drop 
versioning altogether and just use the presence or absence of feature keys 
as the means of determining the version.


>Another useful one I brought up last time would be some indication that 
>the application was definitely not going to be reused, i.e., it's being 
>invoked in a CGI context.  The performance issues there are completely 
>different than in other environments.

Okay...  how about 'wsgi.last_call', which is a true value if this 
invocation of the application will *probably* be the last?  IOW, the server 
need not guarantee that the app will *not* be called again; this is just a 
"suggestion".


>>.. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft
>>    (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt)
>
>I think before we discussed being explicit about a couple variables. 
>Specifically that SCRIPT_NAME should refer to the application's root, and 
>PATH_INFO to everything that comes after.

Good point; I'll update this.


>Should there be any policy about path segments containing //, ./, or ../?

What do you have in mind?


>Hmm... what should the server do if it gets a Location header with no Status?

There's no such thing; there's always a status under this spec.  However, 
what happens to the HTTP headers passed to 'start_response()' could perhaps 
be made clearer.


>The CGI spec says servers should change the current working directory to 
>the resource being run.  I think this won't be that common for WSGI 
>servers, though.

Do you think this needs to be stated?  WSGI only references CGI with 
respect to environment variables.


>Will GATEWAY_INTERFACE be defined?  If so, what value?  "WSGI/1.0"?  I 
>assume SERVER_SOFTWARE will be up to the WSGI server.  Should they be sure 
>to rewrite this value if these servers are nested?  E.g., should your CGI 
>example rewrite that value?  It seems like each piece adds another name to 
>the end in the format "name/version_number", where the name has no 
>spaces.  And it might optionally have more information in parenthesis 
>after the version, which may contain spaces.  Maybe this should be a 
>suggestion.

The normal value of the CGI variables should be server-defined.  WSGI 
variables should be out-of-band.


>Is there any non-parsed header form?

The entire thing is "non-parsed headers".  They're a list of tuples.  If 
you mean, can you stop a web server from adding/changing headers according 
to its whims, then no, you can't.


>This is from the CGI spec:
>
>    Scripts MUST be prepared to handled URL-encoded values in
>    metavariables. In addition, they MUST recognise both "+" and
>    "%20" in URL-encoded quantities as representing the space
>    character. (See section 3.1.)
>
>That seems weird; I've never URL-decoded values besides QUERY_STRING.

That's probably an addition to the 1.1 spec.  However, ISTM I've seen code 
in Zope that expects to decode path segments.  I could be wrong.


>The CGI spec doesn't seem to mention REQUEST_URI.  That's surprising. 
>Here's the Apache CGI variables it doesn't mention:
>
>SERVER_SIGNATURE (pretty boring)
>SERVER_ADDR (seems very basic)
>DOCUMENT_ROOT (doesn't seem appropriate)
>SCRIPT_FILENAME (also often not appropriate)
>SERVER_ADMIN (boring)
>SCRIPT_URI
>REQUEST_URI (I don't understand the distinction)
>REMOTE_PORT (boring, though I guess if you wanted to add an ident check it 
>would be useful)
>UNIQUE_ID (not needed)
>
>
>I think SERVER_ADDR and REMOTE_PORT are easy to add, and potentially 
>useful.  SCRIPT_URI and REQUEST_URI might be good.

Sigh.  I guess maybe I'll have to go back and pick out variables one by 
one.  However, I don't think *any* of the variables you listed should be 
required to exist.  For one thing, it's much easier to write middleware if 
you only have to munge SCRIPT_NAME and PATH_INFO during traversals.

From pje at telecommunity.com  Wed Aug 11 18:52:25 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Aug 11 18:52:29 2004
Subject: [Web-SIG] Re: The rewritten WSGI pre-PEP
In-Reply-To: <cfcugi$9hi$1@sea.gmane.org>
References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040811124440.03a6cd20@mail.telecommunity.com>

At 01:04 PM 8/11/04 +0200, Fredrik Lundh wrote:
>Phillip J. Eby wrote:
>
> > As always, your comments and feedback are appreciated.
> >      def run_with_cgi(application):
> >
> >          environ = {}
> >          envrion.update(os.environ)
>
>NameError

[added to to-do list]


> >          environ['wsgi.input']        = sys.stdin
> >          environ['wsgi.errors']       = sys.stderr
> >          environ['wsgi.version']      = '1.0'
> >          environ['wsgi.multithread']  = False
> >          environ['wsgi.multiprocess'] = True
>
>The answer's probably hidden somewhere in the mailing list archives, but why
>do you mix WSGI variables with external CGI environment variables?
>
>I'd prefer
>
>      def application(context, environ, start_response)
>
>where context is an object of a server-defined type, with attributes for
>input, errors, etc:
>
>             context = MyApplicationServerContext()
>             context.input = sys.stdin
>             context.errors = sys.stderr
>             context.version = "1.0" (or (1, 0))
>             etc
>
>Advantages:
>- contexts can (probably) be reused
>- attributes can be lazily initialized (via properties or getattr hooks)
>- the user code looks nicer
>- future safe: more attributes and methods can be added to the context
>   object in future revisions of this specification, without changing the
>   function signatures

All of these advantages also apply to an object supplied in the dictionary, 
i.e.:

     environ['some_server.context'] = context_object


>Disadvantages:
>- one more argument; but if that's really a problem, why not make
>   start_response a method of the context class?
>
>         def application(context, environ):
>             ...
>             context.start(status, headers)

The advantage is simplicity of implementation.  It's possible to write 
middleware (application that's also a server) without creating any new 
classes.  In essence, WSGI is an almost pure-functional architecture, which 
makes it (IMO) easier to reason about.


> > The second parameter passed to the application object is itself a
> > two-argument callable, used to begin the HTTP response and return
> > a ``write()`` function.  The first parameter it takes is a "status"
> > string, of the form ``"999 Message here"``, where ``999`` is replaced
> > with the HTTP status code, and ``Message here`` is replaced with the
> > appropriate message text.
>
>To make life easier for users, you might wish to accept either an integer
>status code (e.g. start(200, headers)) or a string.  In case a status code
>is provided, the server can fill in a suitable string value (as per the HTTP
>specification).

I thought about this, but the diffference between '200' and '"200 OK"' is 
so trivial as to be unuseful compared to the scope creep for the server's 
implementation.  That is, allowing this means the server software has to 
have a list of the numeric statuses, versus an application author looking 
up the few that they actually want to use.  Also, web frameworks often 
already have such a lookup table, so it seems to me that putting the 
responsibility on the application side is the better balance.


>Except for those small nits, I'm totally +1 on this proposal.

Thanks.  I'll add your questions to the Q&A section.

From pje at telecommunity.com  Wed Aug 11 18:57:30 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Aug 11 18:57:35 2004
Subject: [Web-SIG] Re: The rewritten WSGI pre-PEP
In-Reply-To: <411A413C.2050803@colorstudy.com>
References: <cfcugi$9hi$1@sea.gmane.org>
	<5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
	<cfcugi$9hi$1@sea.gmane.org>
Message-ID: <5.1.1.6.0.20040811125244.03a790d0@mail.telecommunity.com>

At 10:54 AM 8/11/04 -0500, Ian Bicking wrote:
>However, because the context could have a complex implementation, it would 
>be hard to rewrite the context if you forward the request.  OTOH, most of 
>the pieces of the context shouldn't be forwarded on.  For instance, if 
>mod_python gives access to the apache module, or the original request 
>object, should middleware pass through that access?  It would probably be 
>incorrect, as the middleware is doing some filtering and the mod_python 
>extensions would bypass that filtering.
>
>Which is to say, middleware shouldn't pass through extensions by default, 
>but with a dictionionary implementation it would be common to do so.

Actually, the idea behind the naming convention is that middleware can 
filter out extensions if it needs to.  It need only delete any lowercase 
key that doesn't begin with 'wsgi.' to remove all extensions, or it can be 
more specific, according to its needs.

I didn't actually mention this in the spec, though, so I'll need to fix that.


>One positive aspect of a dictionary is that introspection is easier. 
>There's no reliable equivalent of .keys() for an arbitrary object.
>
>And, if we package things into an object, environ could also become an 
>attribute of context.

I'm -1 on making an object out of it.  It will make the spec even longer 
than it already is, and it will increase the number of things to 
discuss.  (E.g. names of the methods).

From fredrik at pythonware.com  Wed Aug 11 19:15:16 2004
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed Aug 11 19:40:21 2004
Subject: [Web-SIG] Re: Re: The rewritten WSGI pre-PEP
References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
	<cfcugi$9hi$1@sea.gmane.org>
	<5.1.1.6.0.20040811124440.03a6cd20@mail.telecommunity.com>
Message-ID: <cfdk3t$rvs$1@sea.gmane.org>

Phillip J. Eby wrote:

>>Advantages:
>>- contexts can (probably) be reused
>>- attributes can be lazily initialized (via properties or getattr hooks)
>>- the user code looks nicer
>>- future safe: more attributes and methods can be added to the context
>>   object in future revisions of this specification, without changing the
>>   function signatures
>
> All of these advantages also apply to an object supplied in the dictionary, i.e.:
>
>     environ['some_server.context'] = context_object

that's obviously not true: environment dictionaries cannot be reused,
environment items cannot be lazily initialized (since you require apps
to use a PyDict), and the code using WSGI variables has to use dict
access syntax (x["y"]) instead of standard attribute access (x.y).

> The advantage is simplicity of implementation.  It's possible to write middleware (application 
> that's also a server) without creating any new classes.

so "def a(b)" is easy to write, but "class a" is hard to write?

you're obviously not interested in feedback from experienced Python
programmers.  I'm sorry I wasted everybody's time.

</F> 


From ianb at colorstudy.com  Wed Aug 11 19:51:56 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Aug 11 19:52:47 2004
Subject: [Web-SIG] The rewritten WSGI pre-PEP
In-Reply-To: <5.1.1.6.0.20040811122521.01ea8c60@mail.telecommunity.com>
References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
	<5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
	<5.1.1.6.0.20040811122521.01ea8c60@mail.telecommunity.com>
Message-ID: <411A5CBC.7080306@colorstudy.com>

Phillip J. Eby wrote:
>> I'm not of strong opinion, but the callables definitely make it harder 
>> to understand.
> 
> 
> ...but easier to implement, since everything can be done with functions 
> and closures.
> 
> Do you think you would have difficulty creating a conforming 
> implementation, or are you just saying it took you a while to grasp how 
> you would do so?

No, I don't think it would make it any harder to implement.  Mostly it's 
just harder to talk about.

>>> ====================   =============================================
>>> Variable               Value
>>> ====================   =============================================
>>> ``wsgi.version``       The string ``"1.0"``
>>
>>
>> Would it make sense for this to be a tuple, like (1, 0), like 
>> sys.version_info?
> 
> 
> Maybe.  I'm not sure it makes any difference.  I could just as soon drop 
> versioning altogether and just use the presence or absence of feature 
> keys as the means of determining the version.

I think of the version as something of a contract.  The WSGI server 
author can't deny that they intended to implement the full spec if they 
include the version number.  Also it could be used like HTTP 1.1 
sometimes is, like you must include a Host header if you claim to be 
talking 1.1.  Similarly applications could require certain features if 
the server claims to talk, say, WSGI 1.1.

>> Another useful one I brought up last time would be some indication 
>> that the application was definitely not going to be reused, i.e., it's 
>> being invoked in a CGI context.  The performance issues there are 
>> completely different than in other environments.
> 
> Okay...  how about 'wsgi.last_call', which is a true value if this 
> invocation of the application will *probably* be the last?  IOW, the 
> server need not guarantee that the app will *not* be called again; this 
> is just a "suggestion".

Yes, that sounds good.

>> Should there be any policy about path segments containing //, ./, or ../?
> 
> 
> What do you have in mind?

I don't know.  Normalization, perhaps -- remove empty path segments, and 
resolve any relative paths.  Which would mean something like:

path = re.sub(r'/[^/]*/../', '/', path)
path = re.sub(r'/./', '/', path)
path = re.sub(r'//+', '/', path)

I dunno... that should probably be up to the application.

>> Hmm... what should the server do if it gets a Location header with no 
>> Status?
> 
> There's no such thing; there's always a status under this spec.  
> However, what happens to the HTTP headers passed to 'start_response()' 
> could perhaps be made clearer.

Okay, that's fine.  Though any internal redirect would have to be done 
through an extension in that case.  Though in practice internal 
redirects are kind of complicated to deal with anyway.  Lots of linking 
confusion, lost headers, etc.

>> The CGI spec says servers should change the current working directory 
>> to the resource being run.  I think this won't be that common for WSGI 
>> servers, though.
> 
> Do you think this needs to be stated?  WSGI only references CGI with 
> respect to environment variables.

Probably it's no big deal.

>> This is from the CGI spec:
>>
>>    Scripts MUST be prepared to handled URL-encoded values in
>>    metavariables. In addition, they MUST recognise both "+" and
>>    "%20" in URL-encoded quantities as representing the space
>>    character. (See section 3.1.)
>>
>> That seems weird; I've never URL-decoded values besides QUERY_STRING.
> 
> 
> That's probably an addition to the 1.1 spec.  However, ISTM I've seen 
> code in Zope that expects to decode path segments.  I could be wrong.

I would assume in that case it was decoding something that was encoded 
on the server side.  E.g.:

<a href="http://whatever.com/documents/I%2FO%20library">I/O library</a>

As opposed to the CGI gateway encoding any of its values.  Even 
QUERY_STRING is encoded by the browser, not the gateway.  Maybe this is 
just a case of HTTP issues leaking into the CGI spec.

>> The CGI spec doesn't seem to mention REQUEST_URI.  That's surprising. 
>> Here's the Apache CGI variables it doesn't mention:
>>
>> SERVER_SIGNATURE (pretty boring)
>> SERVER_ADDR (seems very basic)
>> DOCUMENT_ROOT (doesn't seem appropriate)
>> SCRIPT_FILENAME (also often not appropriate)
>> SERVER_ADMIN (boring)
>> SCRIPT_URI
>> REQUEST_URI (I don't understand the distinction)
>> REMOTE_PORT (boring, though I guess if you wanted to add an ident 
>> check it would be useful)
>> UNIQUE_ID (not needed)
>>
>>
>> I think SERVER_ADDR and REMOTE_PORT are easy to add, and potentially 
>> useful.  SCRIPT_URI and REQUEST_URI might be good.
> 
> 
> Sigh.  I guess maybe I'll have to go back and pick out variables one by 
> one.  However, I don't think *any* of the variables you listed should be 
> required to exist.  For one thing, it's much easier to write middleware 
> if you only have to munge SCRIPT_NAME and PATH_INFO during traversals.

I've had constant problems trying to backtrack through middleware (like 
mod_rewrite) to figure out how to create a URL that is internal to the 
application.  I'd like to keep around some artifact indicating what the 
original URI was (e.g., REQUEST_URI); something that middleware 
specifically should not rewrite.  Nor is there any real reason for it to 
be rewritten.

SERVER_ADDR and REMOTE_PORT also don't require any rewriting, and should 
just be passed through any middleware.  Hmm... the CGI spec also leaves 
out any SSL variables.  Those are, of course, all optional.  But if the 
user connected via SSL, I think HTTPS=on should be required.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From tali.wang at gmail.com  Wed Aug 11 20:19:51 2004
From: tali.wang at gmail.com (Taliesin Wang)
Date: Wed Aug 11 20:19:56 2004
Subject: [Web-SIG] my view of python web app server
Message-ID: <abb391f804081111191d500ac5@mail.gmail.com>

hi,all,
I'm new to here,and my english is not good.so, forgive me if
grammer/spell mistake.

My view on python web app server:

1, Try to implement a app server in Java wayis not valueable. To go
this way, the best result we could archive is to make an "tomcat"
clone.

2, The advangtage of python , I think is in ORM. in java, we need to
do a lot of work to void strong type limitation, but in python, it
could be much more easier.

3, For high traffic website, cache is used to avoid heavy db
operation. For java, the operation of disk/memory is not flexable
enough. But in python,it's more easier.

>From ideas above, I suggest:

1, We go from ORM first, have a python version of hibernator first.
  As 80% of web programming is based on db operation(add/edit/remove a
record). these object could be named PWO(Python Web Object).
2, Leave the main space to those PWO, and treat
request/response/session concepts as "helper" class ,serve PWO.
3, highly intergratted db driver into App server.
(An embedded template engine is welcome,too).

the final goal is, the web app developper need not to know what db
they use, and what's the schema the database is. just write PWOs ,
deployee them, then orgnazie them by Business Logic.


-- 
Wang
-----------------------------------------------
Email:tali.wang@gmail.com
Mobile:+86-136-3281-4194
-----------------------------------------------
From pje at telecommunity.com  Wed Aug 11 20:30:49 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Aug 11 20:30:57 2004
Subject: [Web-SIG] Re: Re: The rewritten WSGI pre-PEP
In-Reply-To: <cfdk3t$rvs$1@sea.gmane.org>
References: <5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
	<cfcugi$9hi$1@sea.gmane.org>
	<5.1.1.6.0.20040811124440.03a6cd20@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040811140110.02a03770@mail.telecommunity.com>

At 07:15 PM 8/11/04 +0200, Fredrik Lundh wrote:
>Phillip J. Eby wrote:
>
> >>Advantages:
> >>- contexts can (probably) be reused
> >>- attributes can be lazily initialized (via properties or getattr hooks)
> >>- the user code looks nicer
> >>- future safe: more attributes and methods can be added to the context
> >>   object in future revisions of this specification, without changing the
> >>   function signatures
> >
> > All of these advantages also apply to an object supplied in the 
> dictionary, i.e.:
> >
> >     environ['some_server.context'] = context_object
>
>that's obviously not true: environment dictionaries cannot be reused,
>environment items cannot be lazily initialized (since you require apps
>to use a PyDict), and the code using WSGI variables has to use dict
>access syntax (x["y"]) instead of standard attribute access (x.y).

I meant that a server-specific 'context_object' can have all those 
advantages, not that the dictionary would.  In other words, I was 
suggesting that WSGI extensions could make use of all of these things, but 
I'd prefer that the core WSGI variables weren't presented that way.

Given that all the WSGI-defined keys are strings or booleans, except for 
'input' and 'errors', I don't see the advantage of lazy initialization for 
the spec-defined values.  I will agree that user code would look nicer as 
attributes, but there are other ways to accomplish that, such as using 
constants for keys, e.g. 'environ[INPUT]', or functions 'input_of(environ)'.

As for future safety, you can add as many framework- or server-specific 
keys, as long as you follow the naming convention.  And those entries can 
be objects of whatever nature is desired.

So really, the only thing that an object *adds* is a '.' syntax.  But, this 
syntax doesn't easily allow for namespaces: if server A and server B both 
define a 'foo' method, but with different signatures, how can an 
application tell what kind of 'foo' it is?  At least with a dictionary, the 
application object can look for 'server_A.foo' and 'server_B.foo' keys.

Finally, although I do want it to be simple on both the server and app 
sides, please remember that this is primarily intended to be a 
server-to-framework protocol, not an API for writing applications.  It's 
expected that normally the only code dealing directly with the WSGI 
'environ' is either framework code, "middleware", or a server.


> > The advantage is simplicity of implementation.  It's possible to write 
> middleware (application
> > that's also a server) without creating any new classes.
>
>so "def a(b)" is easy to write, but "class a" is hard to write?

No, it's that one can write code to transform the dictionary in-place, 
while supplying an altered context object will require not just an extra 
class, but potentially error-prone code to copy attributes and delegate 
methods to the previous context object.  That is, unless the spec requires 
that the context object allow arbitrary attributes to be set, or provides 
some extension dictionary, there's no way for a "middleware" component to 
add new behaviors.  Whereas, under the present spec, it's just 
'environ[somekey]=value', and pass the call on to the next "application" 
object.


>you're obviously not interested in feedback from experienced Python
>programmers.  I'm sorry I wasted everybody's time.

Huh?  I thought we were just getting started.  Ian argued with me for weeks 
to get a lot of the stuff in this draft that he wanted, so I'm not closed 
to feedback, just thickheaded.  :)  I'm also prone to not communicating all 
of my assumptions/conclusions about a design, because I think they're 
"obvious".  So, feedback like yours forces me to elaborate on them, and if 
you can get me to understand your actual use case I'll try to incorporate 
it.  If it means redoing the whole spec, so be it -- if you search for my 
first posting of the spec last December, you'll notice that this version is 
almost *nothing* like the original, which had objects and methods galore, 
by comparison.

From pje at telecommunity.com  Wed Aug 11 20:40:56 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Aug 11 20:41:02 2004
Subject: [Web-SIG] The rewritten WSGI pre-PEP
In-Reply-To: <411A5CBC.7080306@colorstudy.com>
References: <5.1.1.6.0.20040811122521.01ea8c60@mail.telecommunity.com>
	<5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
	<5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
	<5.1.1.6.0.20040811122521.01ea8c60@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040811143129.02974c30@mail.telecommunity.com>

At 12:51 PM 8/11/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>>>====================   =============================================
>>>>Variable               Value
>>>>====================   =============================================
>>>>``wsgi.version``       The string ``"1.0"``
>>>
>>>
>>>Would it make sense for this to be a tuple, like (1, 0), like 
>>>sys.version_info?
>>
>>Maybe.  I'm not sure it makes any difference.  I could just as soon drop 
>>versioning altogether and just use the presence or absence of feature 
>>keys as the means of determining the version.
>
>I think of the version as something of a contract.  The WSGI server author 
>can't deny that they intended to implement the full spec if they include 
>the version number.  Also it could be used like HTTP 1.1 sometimes is, 
>like you must include a Host header if you claim to be talking 
>1.1.  Similarly applications could require certain features if the server 
>claims to talk, say, WSGI 1.1.

Fair enough.  Unless anybody else has any input one way or the other, we'll 
make it the tuple (1,0).


>I've had constant problems trying to backtrack through middleware (like 
>mod_rewrite) to figure out how to create a URL that is internal to the 
>application.  I'd like to keep around some artifact indicating what the 
>original URI was (e.g., REQUEST_URI); something that middleware 
>specifically should not rewrite.  Nor is there any real reason for it to 
>be rewritten.

Hm.  And SCRIPT_NAME is insufficient for this?  I think I can see why 
mod_rewrite would make this a problem, but ISTM that Python middleware 
component could do rewrites that left SCRIPT_NAME "logically correct".

I'm more concerned that the presence of such a variable would encourage 
people to use it in ways that would ignore "rewritten" variables, thus 
breaking middleware.  Meanwhile, the common solution I've seen to this 
issue in web applications is to have configuration for where the 
application is in URL-space.


>SERVER_ADDR and REMOTE_PORT also don't require any rewriting, and should 
>just be passed through any middleware.

Are you sure?  SERVER_ADDR might be different if the request is forwarded 
to another machine, mightn't it?  I seem to recall that mod_backhand does 
some stuff with this.  In any case it highlights the trouble with trying to 
precisely pin down things that are already inherently 
implementation-defined.  Unfortunately, WSGI isn't really going to 
eliminate all the environment introspecting and munging code that lives in 
the various existing apps and frameworks today.


>   Hmm... the CGI spec also leaves out any SSL variables.  Those are, of 
> course, all optional.  But if the user connected via SSL, I think 
> HTTPS=on should be required.

I'll add something about this, and maybe some sort of a general note about 
the inherent implementation-specificness of CGI variables.  :(

From ianb at colorstudy.com  Wed Aug 11 21:20:23 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Aug 11 21:21:11 2004
Subject: [Web-SIG] The rewritten WSGI pre-PEP
In-Reply-To: <5.1.1.6.0.20040811143129.02974c30@mail.telecommunity.com>
References: <5.1.1.6.0.20040811122521.01ea8c60@mail.telecommunity.com>
	<5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
	<5.1.1.6.0.20040808194852.023f7490@mail.telecommunity.com>
	<5.1.1.6.0.20040811122521.01ea8c60@mail.telecommunity.com>
	<5.1.1.6.0.20040811143129.02974c30@mail.telecommunity.com>
Message-ID: <411A7177.5040108@colorstudy.com>

Phillip J. Eby wrote:
>> I've had constant problems trying to backtrack through middleware 
>> (like mod_rewrite) to figure out how to create a URL that is internal 
>> to the application.  I'd like to keep around some artifact indicating 
>> what the original URI was (e.g., REQUEST_URI); something that 
>> middleware specifically should not rewrite.  Nor is there any real 
>> reason for it to be rewritten.
> 
> 
> Hm.  And SCRIPT_NAME is insufficient for this?  I think I can see why 
> mod_rewrite would make this a problem, but ISTM that Python middleware 
> component could do rewrites that left SCRIPT_NAME "logically correct".

I suppose it could, i.e., http:// + SERVER_NAME + ":" + SERVER_PORT + 
SCRIPT_NAME + PATH_INFO + "?" + QUERY_STRING is the complete URL.  If 
that's the expectation, then that too should be in the spec.  But, if 
only because of the existance of mod_rewrite, that's not likely to be 
true.  REQUEST_URI just seems like a natural part of the request 
description -- it says exactly what the client asked for, without the 
extra meaning that SCRIPT_NAME and PATH_INFO have.

In the end I've come to dislike mod_rewrite because of these issues, but 
  given its existance...

>> SERVER_ADDR and REMOTE_PORT also don't require any rewriting, and 
>> should just be passed through any middleware.
> 
> 
> Are you sure?  SERVER_ADDR might be different if the request is 
> forwarded to another machine, mightn't it?  I seem to recall that 
> mod_backhand does some stuff with this.  In any case it highlights the 
> trouble with trying to precisely pin down things that are already 
> inherently implementation-defined.  Unfortunately, WSGI isn't really 
> going to eliminate all the environment introspecting and munging code 
> that lives in the various existing apps and frameworks today.

If SERVER_ADDR needs to be rewritten, then SERVER_NAME would be 
rewritten at the same time.

I think I've also seen some inconsistencies of SERVER_NAME and 
HTTP_HOST.  SERVER_NAME tends to be the canonical name of the host, 
ignoring any named virtual hosts (at least in Apache).  So really if you 
are going to construct a URL it should use (environ.get("HTTP_HOST") or 
environ.get("SERVER_NAME")).

Maybe it would be good to include how the URL is supposed to be split 
up, at least informationally.  Like, you can reconstruct the URL by doing:

if environ.get('HTTPS') == 'on':
     url = 'https://'
else:
     url = 'http://'
if environ.get('HTTP_HOST'):
     url += environ['HTTP_HOST']
else:
     url += environ['SERVER_NAME']
if environ.get('HTTPS') == 'on':
     if environ['SERVER_PORT'] != '443'
        url += ':' + environ['SERVER_PORT']
else:
     if environ['SERVER_PORT'] != '80':
        url += ':' + environ['SERVER_PORT']
url += environ['SCRIPT_NAME']
url += environ.get('PATH_INFO', '')
if environ.get('QUERY_STRING'):
     url += '?' + environ['QUERY_STRING']


This should never fail (no missing keys), and should always be accurate 
except for details like a ? without a query string, or an explicit port 
that matches the default, or a server may optionally normalize the path.

If it can't be accurate -- e.g., because SCRIPT_NAME or PATH_INFO have 
been muddled (or even QUERY_STRING) -- then I'd like to have a 
REQUEST_URI which is accurate.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From angryhicKclown at netscape.net  Sat Aug 14 19:42:22 2004
From: angryhicKclown at netscape.net (angryhicKclown@netscape.net)
Date: Sat Aug 14 19:42:28 2004
Subject: [Web-SIG] WSGI - alternative ideas
Message-ID: <1B619C81.6DDEF2D7.519F8DB3@netscape.net>

Hi, I've just subscribed to this list, but I've read much of the archives. Python is in dire and immediate need of WSGI.

I think WSGI needs to be essentially very similar to jonpy (jonpy.sf.net), except without the templating. Jonpy exposes an interface very similar to Java servlets, and can run on cgi, fastcgi, and mod_python by changing one line of code. WSGI, I believe, should be a higher-level interface than what has been currently outlined. For Python to succeed as a web language (and I believe that it will), it needs to support the following out of the box:

- a clean servlet interface, see jonpy's Handler classes
- support for a multitude of different platforms easily
- sessions
- database connection pooling
- caching

The syntax for something like this would be as follows:

-------------------------

import wsgi

class MyServlet(wsgi.Servlet): # perhaps a different name than Servlet?
    def handle(self, req, **formargs):
        pass

wsgi.main(MyServlet())

------------------

The wsgi module should automatically detect if its running under CGI, mod_python, fastcgi, PyWX, or even IIS ASP with Python activex script or ISAPI. The request args are passed as key=value, unless there are multiple values for one key, in which case the values are passed as a list.

The request object would support sessions via a "req.sessions" dict. WSGI would pick the storage method it uses depending on what platform it is run on.

It would also support a database pool by using a "req.pool" object. I believe it should support pooling of any type of class. Here's an idea for syntax:

req.pool['database'] = (MySQLdb.connect, {'user':'example','passwd':'secret','db':'example'})

And a call to req.pool['database'] would check out a connection to that database, and would be automatically returned at the end of the request.


Or am I taking this at too high a level? Perhaps it should simply clone the cgi module for different platforms (i.e. from wsgi import cgi, from wsgi import mod_python), or, perhaps the wsgi module will expose the same interface as the cgi module, and autodetect the platform and act accordingly.


Thanks for reading,

Peter Hunt

__________________________________________________________________
Switch to Netscape Internet Service.
As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register

Netscape. Just the Net You Need.

New! Netscape Toolbar for Internet Explorer
Search from anywhere on the Web and block those annoying pop-ups.
Download now at http://channels.netscape.com/ns/search/install.jsp
From pje at telecommunity.com  Sat Aug 14 19:53:51 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat Aug 14 19:54:20 2004
Subject: [Web-SIG] WSGI - alternative ideas
In-Reply-To: <1B619C81.6DDEF2D7.519F8DB3@netscape.net>
Message-ID: <5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com>

At 01:42 PM 8/14/04 -0400, angryhicKclown@netscape.net wrote:
>Hi, I've just subscribed to this list, but I've read much of the archives. 
>Python is in dire and immediate need of WSGI.
>
>I think WSGI needs to be essentially very similar to jonpy (jonpy.sf.net), 
>except without the templating. Jonpy exposes an interface very similar to 
>Java servlets, and can run on cgi, fastcgi, and mod_python by changing one 
>line of code. WSGI, I believe, should be a higher-level interface than 
>what has been currently outlined. For Python to succeed as a web language 
>(and I believe that it will), it needs to support the following out of the box:
>
>- a clean servlet interface, see jonpy's Handler classes
>- support for a multitude of different platforms easily
>- sessions
>- database connection pooling
>- caching

These needs are already served by dozens of Python web frameworks.  To 
duplicate even *one* of these facilities in the WSGI specification simply 
adds to the number of existing web frameworks, without fixing 
anything.  WSGI is *intentionally* primitive, to minimize the number of 
things that different frameworks disagree on.

Unfortunately, *everybody* wants to write the "framework to end all 
frameworks", but this always just results in the existence of framework 
number N+1.  To really change the status quo, there *must* exist something 
which is *not* a framework.

WSGI can reach critical mass if a sufficient number of popular frameworks 
and servers support it.  By contrast, a new framework must successfully 
"recruit" *individual* users of existing frameworks who have (potentially) 
already written quite a lot of code to that framework's API.

A new framework also threatens the value of the investments existing 
framework authors have made, and therefore does not encourage their 
participation in "cannibalizing" their work!

From mnot at mnot.net  Sat Aug 14 20:25:25 2004
From: mnot at mnot.net (Mark Nottingham)
Date: Sat Aug 14 20:25:30 2004
Subject: [Web-SIG] WSGI - alternative ideas
In-Reply-To: <5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com>
References: <5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com>
Message-ID: <4FE70A4B-EE1F-11D8-9BC6-000A95BD86C0@mnot.net>

+1

FWIW, I really like this; I'm going to code something up and see how it 
goes, but from a first look, this is *exactly* what the world needs.

My comments so far (based on revision 1.1):

  - in general, it would be helpful if references to external specs and 
constructs they define (e.g., CGI, HTTP, URI) had explicit links and 
section numbers, so we all are talking about the same things.

- in "The start_response() Callable", trailing CRs or LFs are 
forbidden; what about those inside the text? Multi-line HTTP headers 
are legal...

- it would be helpful if you gave distinguished names to the different 
callables flying around, and perhaps included an illustration; it gets 
confusing.

- could you talk a bit about the choice of using an environment 
dictionary for requests? In particular, I understand that CGI-style 
environment variables makes things easy for CGI implementations, 
potentially at the expense of others; why not do a list of tuples -- in 
the style that you describe for response headers?

Cheers,


On Aug 14, 2004, at 10:53 AM, Phillip J. Eby wrote:

> At 01:42 PM 8/14/04 -0400, angryhicKclown@netscape.net wrote:
>> Hi, I've just subscribed to this list, but I've read much of the 
>> archives. Python is in dire and immediate need of WSGI.
>>
>> I think WSGI needs to be essentially very similar to jonpy 
>> (jonpy.sf.net), except without the templating. Jonpy exposes an 
>> interface very similar to Java servlets, and can run on cgi, fastcgi, 
>> and mod_python by changing one line of code. WSGI, I believe, should 
>> be a higher-level interface than what has been currently outlined. 
>> For Python to succeed as a web language (and I believe that it will), 
>> it needs to support the following out of the box:
>>
>> - a clean servlet interface, see jonpy's Handler classes
>> - support for a multitude of different platforms easily
>> - sessions
>> - database connection pooling
>> - caching
>
> These needs are already served by dozens of Python web frameworks.  To 
> duplicate even *one* of these facilities in the WSGI specification 
> simply adds to the number of existing web frameworks, without fixing 
> anything.  WSGI is *intentionally* primitive, to minimize the number 
> of things that different frameworks disagree on.
>
> Unfortunately, *everybody* wants to write the "framework to end all 
> frameworks", but this always just results in the existence of 
> framework number N+1.  To really change the status quo, there *must* 
> exist something which is *not* a framework.
>
> WSGI can reach critical mass if a sufficient number of popular 
> frameworks and servers support it.  By contrast, a new framework must 
> successfully "recruit" *individual* users of existing frameworks who 
> have (potentially) already written quite a lot of code to that 
> framework's API.
>
> A new framework also threatens the value of the investments existing 
> framework authors have made, and therefore does not encourage their 
> participation in "cannibalizing" their work!
>
> _______________________________________________
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: 
> http://mail.python.org/mailman/options/web-sig/mnot%40mnot.net
>

--
Mark Nottingham     http://www.mnot.net/

From pje at telecommunity.com  Sat Aug 14 21:05:56 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat Aug 14 21:06:21 2004
Subject: [Web-SIG] WSGI - alternative ideas
In-Reply-To: <4FE70A4B-EE1F-11D8-9BC6-000A95BD86C0@mnot.net>
References: <5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com>
	<5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040814144604.03088180@mail.telecommunity.com>

At 11:25 AM 8/14/04 -0700, Mark Nottingham wrote:
>+1
>
>FWIW, I really like this; I'm going to code something up and see how it 
>goes, but from a first look, this is *exactly* what the world needs.
>
>My comments so far (based on revision 1.1):
>
>  - in general, it would be helpful if references to external specs and 
> constructs they define (e.g., CGI, HTTP, URI) had explicit links and 
> section numbers, so we all are talking about the same things.
>
>- in "The start_response() Callable", trailing CRs or LFs are forbidden; 
>what about those inside the text? Multi-line HTTP headers are legal...

In the next draft, I'll be drilling into these issues more, per Ian and 
Fredrik's comments earlier this week.  Specifically, I'm going to go into a 
lot more detail about *which* CGI variables are required to be available, 
and what headers must be supplied by the application object versus those 
which must be supplied by the server if not present.


>- it would be helpful if you gave distinguished names to the different 
>callables flying around, and perhaps included an illustration; it gets 
>confusing.

Well, the only one that doesn't have an explicit name is the 'write' 
callable, and I can fix that by calling it "the 'write' callable".  Some 
editing should ensure that all callables are referred to with some kind of 
name nearby.


>- could you talk a bit about the choice of using an environment dictionary 
>for requests? In particular, I understand that CGI-style environment 
>variables makes things easy for CGI implementations, potentially at the 
>expense of others; why not do a list of tuples -- in the style that you 
>describe for response headers?

Because CGI variables aren't ordered.  If the request input were required 
to be HTTP headers, this would make it impossible for CGI, FastCGI and 
other gateways defined in terms of CGI to serve as valid WSGI implementations.

From mnot at mnot.net  Sat Aug 14 22:21:15 2004
From: mnot at mnot.net (Mark Nottingham)
Date: Sat Aug 14 22:21:19 2004
Subject: [Web-SIG] WSGI - alternative ideas
In-Reply-To: <5.1.1.6.0.20040814144604.03088180@mail.telecommunity.com>
References: <5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com>
	<5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com>
	<5.1.1.6.0.20040814144604.03088180@mail.telecommunity.com>
Message-ID: <7E38FB6A-EE2F-11D8-9BC6-000A95BD86C0@mnot.net>


On Aug 14, 2004, at 12:05 PM, Phillip J. Eby wrote:

> Well, the only one that doesn't have an explicit name is the 'write' 
> callable, and I can fix that by calling it "the 'write' callable".  
> Some editing should ensure that all callables are referred to with 
> some kind of name nearby.

Great. Maybe something more descriptive, like writeResponseBody?

>> - could you talk a bit about the choice of using an environment 
>> dictionary for requests? In particular, I understand that CGI-style 
>> environment variables makes things easy for CGI implementations, 
>> potentially at the expense of others; why not do a list of tuples -- 
>> in the style that you describe for response headers?
>
> Because CGI variables aren't ordered.  If the request input were 
> required to be HTTP headers, this would make it impossible for CGI, 
> FastCGI and other gateways defined in terms of CGI to serve as valid 
> WSGI implementations.

Sorry, I don't follow. HTTP headers aren't ordered, except within a 
particular header field-name. It's trivial to map from a dictionary 
like {"HTTP_REFERER": "http://www.example.com/", "HTTP_HOST": 
"www.example.org"} to [("referer", "http://www.example.com/"), ("host", 
"www.example.org")]. No information is lost, and it's easier for 
non-CGI implementations to work with.

This isn't to say that the entire environment should be in this style; 
I'm just concerned about the HTTP headers.


One other thing -- as far as I can tell, this interface can't 
accommodate Expect/100-Continue interactions, as specified in RFC2616 
section 8.2.3. I.e., to support this, the application needs to be given 
access to the request headers before reading the request body, so that 
it can send a 100 Continue, then read the request body, then send a 
normal response.

I think it would be possible to support this feature without unduly 
burdening implementations by saying that start_response() can be called 
a second time, IF there is an expect: 100-continue header in the 
request. Server implementations which don't support this behaviour, or 
automatically handle it themselves, can filter out that header.

E.g., if the request environment contains the expect: 100-continue 
header, the application can do one of three things;

1) respond as normal; i.e., call start_response() and send a 
successful, redirect, or error response (possibly blocking until the 
request body is received)

2) respond with a 417 Expectation Failed status code

3) respond with a 100 Continue in the first call to start_response(), 
and then call start_response() again to make the actual response.

Thoughts?

Keep up the good work!

--
Mark Nottingham     http://www.mnot.net/

From pje at telecommunity.com  Sun Aug 15 00:11:10 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun Aug 15 00:11:37 2004
Subject: [Web-SIG] WSGI - alternative ideas
In-Reply-To: <7E38FB6A-EE2F-11D8-9BC6-000A95BD86C0@mnot.net>
References: <5.1.1.6.0.20040814144604.03088180@mail.telecommunity.com>
	<5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com>
	<5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com>
	<5.1.1.6.0.20040814144604.03088180@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040814175553.02681ab0@mail.telecommunity.com>

At 01:21 PM 8/14/04 -0700, Mark Nottingham wrote:
>On Aug 14, 2004, at 12:05 PM, Phillip J. Eby wrote:
>>Because CGI variables aren't ordered.  If the request input were required 
>>to be HTTP headers, this would make it impossible for CGI, FastCGI and 
>>other gateways defined in terms of CGI to serve as valid WSGI implementations.
>
>Sorry, I don't follow. HTTP headers aren't ordered, except within a 
>particular header field-name. It's trivial to map from a dictionary like 
>{"HTTP_REFERER": "http://www.example.com/", "HTTP_HOST": 
>"www.example.org"} to [("referer", "http://www.example.com/"), ("host", 
>"www.example.org")]. No information is lost, and it's easier for non-CGI 
>implementations to work with.

Hm.  How many such frameworks are there?  I'm making the implicit 
assumption that web servers that know how to create CGI variables are more 
common than frameworks that don't use CGI variables.  If that's not a valid 
assumption, then perhaps the decision should be revisited.


>One other thing -- as far as I can tell, this interface can't accommodate 
>Expect/100-Continue interactions, as specified in RFC2616 section 8.2.3. 
>I.e., to support this, the application needs to be given access to the 
>request headers before reading the request body, so that it can send a 100 
>Continue, then read the request body, then send a normal response.

Actually, this *could* work under the current spec, so long as the 
application manages the second response.  However, there are pieces that 
need to be better defined, such as what happens if the application writes 
more data than is present in the outgoing 'Content-Length' header.


>I think it would be possible to support this feature without unduly 
>burdening implementations by saying that start_response() can be called a 
>second time, IF there is an expect: 100-continue header in the request. 
>Server implementations which don't support this behaviour, or 
>automatically handle it themselves, can filter out that header.
>
>E.g., if the request environment contains the expect: 100-continue header, 
>the application can do one of three things;
>
>1) respond as normal; i.e., call start_response() and send a successful, 
>redirect, or error response (possibly blocking until the request body is 
>received)
>
>2) respond with a 417 Expectation Failed status code
>
>3) respond with a 100 Continue in the first call to start_response(), and 
>then call start_response() again to make the actual response.
>
>Thoughts?

Unfortunately, I have no experience with this aspect of HTTP/1.1.  I fear I 
shall end up having to study the RFC extensively before drawing a 
conclusion on this.  :(

It seems to me that another approach is possible, though...  couldn't the 
web server just send a 100 Continue response if there's an "expect: 
100-continue"  header in the request, and you attempt to read from the 
input stream before you've called the 'start_response' callable?  At first 
glance, this sounds like a reasonable way to handle it, that wouldn't 
require any explicit handling by the application code.  Then, WSGI could 
also require that such an "expect" header must NOT appear in the request 
passed to an application.

From mnot at mnot.net  Sun Aug 15 01:01:21 2004
From: mnot at mnot.net (Mark Nottingham)
Date: Sun Aug 15 01:01:27 2004
Subject: [Web-SIG] WSGI - alternative ideas
In-Reply-To: <5.1.1.6.0.20040814175553.02681ab0@mail.telecommunity.com>
References: <5.1.1.6.0.20040814144604.03088180@mail.telecommunity.com>
	<5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com>
	<5.1.1.6.0.20040814134326.02b43e80@mail.telecommunity.com>
	<5.1.1.6.0.20040814144604.03088180@mail.telecommunity.com>
	<5.1.1.6.0.20040814175553.02681ab0@mail.telecommunity.com>
Message-ID: <DBFF41C8-EE45-11D8-9BC6-000A95BD86C0@mnot.net>


On Aug 14, 2004, at 3:11 PM, Phillip J. Eby wrote:
> It seems to me that another approach is possible, though...  couldn't 
> the web server just send a 100 Continue response if there's an 
> "expect: 100-continue"  header in the request, and you attempt to read 
> from the input stream before you've called the 'start_response' 
> callable?  At first glance, this sounds like a reasonable way to 
> handle it, that wouldn't require any explicit handling by the 
> application code.  Then, WSGI could also require that such an "expect" 
> header must NOT appear in the request passed to an application.

That sounds very reasonable...

--
Mark Nottingham     http://www.mnot.net/

From angryhicKclown at netscape.net  Sun Aug 15 02:27:28 2004
From: angryhicKclown at netscape.net (angryhicKclown@netscape.net)
Date: Sun Aug 15 02:27:32 2004
Subject: [Web-SIG] Re: WSGI - alternative ideas
Message-ID: <77A88906.512D2578.519F8DB3@netscape.net>

Thanks for replying.

Python does need sessions, pooling, and caching, but, as I understand it, they would be implemented as separate modules on top of WSGI?

And how about simply creating a wsgi module that emulates the cgi module, except works across different web platforms?

__________________________________________________________________
Switch to Netscape Internet Service.
As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register

Netscape. Just the Net You Need.

New! Netscape Toolbar for Internet Explorer
Search from anywhere on the Web and block those annoying pop-ups.
Download now at http://channels.netscape.com/ns/search/install.jsp
From pje at telecommunity.com  Sun Aug 15 06:26:43 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun Aug 15 06:27:12 2004
Subject: [Web-SIG] Re: WSGI - alternative ideas
In-Reply-To: <77A88906.512D2578.519F8DB3@netscape.net>
Message-ID: <5.1.1.6.0.20040815002103.035e7ec0@mail.telecommunity.com>

At 08:27 PM 8/14/04 -0400, angryhicKclown@netscape.net wrote:
>Thanks for replying.
>
>Python does need sessions, pooling, and caching, but, as I understand it, 
>they would be implemented as separate modules on top of WSGI?

More precisely, the idea is to convince authors of existing frameworks that 
provide those services, to enable their frameworks to be run under various 
web servers, and the authors of web servers, to support WSGI so those 
frameworks can run.

To a limited extent, WSGI itself can support new "ultralight" frameworks, 
in the sense that WSGI is intended to allow easy creation of "middleware" 
components.  For example, one could create a WSGI "session manager" that 
looks at a request and adds a session object to the 'environ' dictionary 
under a special key.

The point is that since it's a standardized API, you can plug together 
whatever components you want or need.


>And how about simply creating a wsgi module that emulates the cgi module, 
>except works across different web platforms?

That's not in scope for the WSGI, whose goals specifically state that the 
specification must *not* require anything added to the standard library.

This does not preclude separate proposals for standard library enhancements 
based on WSGI; it's just that they're not a part of *this* proposal.

From paul.boddie at ementor.no  Mon Aug 16 10:13:32 2004
From: paul.boddie at ementor.no (Paul Boddie)
Date: Mon Aug 16 10:13:37 2004
Subject: [Web-SIG] WSGI - alternative ideas
Message-ID: <FD72AF7813F1294C95279EC6D9784A2F015719B8@100NOOSLMSG004.common.alpharoot.net>

> angryhicKclown@netscape.net wrote:
> 
> Hi, I've just subscribed to this list, but I've read much of the
archives.
> Python is in dire and immediate need of WSGI.

As later messages have suggested, it isn't so much WSGI that you're
looking
for, but a standardised API for application development.

> I think WSGI needs to be essentially very similar to jonpy
(jonpy.sf.net),
> except without the templating. Jonpy exposes an interface very similar
to
> Java servlets, and can run on cgi, fastcgi, and mod_python by changing
one
> line of code. WSGI, I believe, should be a higher-level interface than
> what has been currently outlined. For Python to succeed as a web
language
> (and I believe that it will), it needs to support the following out of
the
> box:
> 
> - a clean servlet interface, see jonpy's Handler classes
> - support for a multitude of different platforms easily

So far, this is what WebStack [1] does. I suppose I could have either
extended jonpy or adopted the API, but I have tried to implement
something
which is more complete from the lowest levels upward.

> - sessions
> - database connection pooling
> - caching

Things like shared resources aren't yet supported by WebStack, but I'm
thinking of ways to expose framework functionality in a uniform fashion.

> The syntax for something like this would be as follows:
> 
> -------------------------
> 
> import wsgi
> 
> class MyServlet(wsgi.Servlet): # perhaps a different name than
Servlet?
>     def handle(self, req, **formargs):
>         pass
> 
> wsgi.main(MyServlet())

This is a lot like WebStack except that the initialisation of resources
(servlets in the above example) varies across frameworks. Therefore, you
wouldn't initialise resources in the same place as they are defined -
see
the examples for WebStack for more details.

> ------------------
> 
> The wsgi module should automatically detect if its running under CGI,
> mod_python, fastcgi, PyWX, or even IIS ASP with Python activex script
or
> ISAPI. The request args are passed as key=value, unless there are
multiple
> values for one key, in which case the values are passed as a list.

See the WebStack framework support and the WebStack.Generic module for
the
API. I've been very conservative with the multiple values per parameter
issue, always returning a list of values whether you intended there to
be
just one or not, mostly because developers should be aware of such
issues
that, if exploited by mischievous users, could make their solutions less
robust.

> The request object would support sessions via a "req.sessions" dict.
WSGI
> would pick the storage method it uses depending on what platform it is
run
> on.

This is the general idea for WebStack's eventual session support.

> It would also support a database pool by using a "req.pool" object. I
> believe it should support pooling of any type of class. Here's an idea
for
> syntax:
> 
> req.pool['database'] = (MySQLdb.connect,
>     {'user':'example','passwd':'secret','db':'example'})
> 
> And a call to req.pool['database'] would check out a connection to
that
> database, and would be automatically returned at the end of the
request.

I'm inclined to utilise a general database pooling package rather than
invent an API which in eventual hindsight could seem to be inadequate.

> Or am I taking this at too high a level? Perhaps it should simply
clone
> the cgi module for different platforms (i.e. from wsgi import cgi,
from
> wsgi import mod_python), or, perhaps the wsgi module will expose the
same
> interface as the cgi module, and autodetect the platform and act
> accordingly.

I think you're looking at the right level for standardisation. What WSGI
is
meant for, as far as I've discovered through reading this list and by
the
occasional question, is the deployment of existing applications on top
of
frameworks or servers which do not natively support the API employed by
those applications; as I noted, to a Webware application running on some
WSGI layer, all frameworks and servers look like Webware.

The problem with (or rather the problem avoided by) WSGI is that it
doesn't
provide any coherency for people writing applications or higher-level
frameworks - by the latter, I'm talking about things which do form
handling
and templating - you still have to choose your favourite framework and
then
hope that the tricks you've employed will work on WSGI. This means that
newcomers still have to stare down that recently pruned list on the
WebProgramming page [2].

Paul

[1] http://www.python.org/pypi?%3Aaction=search&name=WebStack
[2] http://www.python.org/cgi-bin/moinmoin/WebProgramming

From pje at telecommunity.com  Mon Aug 16 17:30:32 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug 16 17:31:17 2004
Subject: [Web-SIG] WSGI - alternative ideas
In-Reply-To: <FD72AF7813F1294C95279EC6D9784A2F015719B8@100NOOSLMSG004.co
	mmon.alpharoot.net>
Message-ID: <5.1.1.6.0.20040816111206.036d47a0@mail.telecommunity.com>

At 10:13 AM 8/16/04 +0200, Paul Boddie wrote:

>The problem with (or rather the problem avoided by) WSGI is that it
>doesn't
>provide any coherency for people writing applications or higher-level
>frameworks - by the latter, I'm talking about things which do form
>handling
>and templating - you still have to choose your favourite framework and
>then
>hope that the tricks you've employed will work on WSGI. This means that
>newcomers still have to stare down that recently pruned list on the
>WebProgramming page [2].

Well, at least it doesn't *add* a new choice to that list.  ;)

It *does*, however, create an environment that allows for "non-framework" 
frameworks, since middleware components can add arbitrary data and service 
objects to the 'environ'.  (And, there's also nothing stopping components 
from being distributed as non-middleware functions or objects that one 
supplies the 'environ' to, in order to obtain data or do things.)

So, even though WSGI itself doesn't provide a higher-level API, its 
existence and popularity should eventually allow users to choose framework 
services on a piece-by-piece rather than framework-at-a-time basis.

But, we won't get there if WSGI doesn't get implemented in web servers, and 
it won't be attractive for server authors unless there's a "market" for 
WSGI web servers.  And there won't be a significant market for them unless 
existing software, under existing frameworks, can run on WSGI.

Anyway, once WSGI middleware components are popular, there's then a market 
for framework authors to allow WSGI components to be plugged in *below* 
their frameworks, e.g. as objects in a Zope "folder", as Webware 
"resources", etc.  Once this happens, I expect some framework authors may 
see the value in refactoring their framework as a collection of WSGI 
middleware components...  at which point frameworks disappear, and 
components reign supreme.  Ultimate choice and flexibility now belongs to 
the user, and we all live happily ever after in the land of happy happy web 
programmers, or something like that.

That is a *long* way off, however.  The reality today is that nothing is 
going to change without a clear win for the framework authors whose 
frameworks own the bulk of the market share in Python web 
applications.  Trying to directly create a new, competing API is quite 
simply an attack on their investment, and it's not going to get us 
anywhere.  At the least, such a new API doesn't do anything positive for 
them.  In principle, WSGI will let their apps run on more servers, and is 
simple enough for server and framework authors to try it out as an experiment.

From ianb at colorstudy.com  Thu Aug 19 06:50:54 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Thu Aug 19 06:50:57 2004
Subject: [Web-SIG] WSGI uses
Message-ID: <412431AE.9050909@colorstudy.com>

I was playing around with making a WSGI server, and I'm starting to 
think that some really neat stuff could be done with middleware.

For instance, I was thinking about setting up something for Medusa with 
WSGI.  But though I think asynchronous code seems like a good server 
architecture, I'm not that interested in it for applications.  But this 
iteration of the WSGI spec allows for async pretty well; you can tell 
you are in that situation when wsgi.multiprocess is false and 
wsgi.multithread is false, and the iterator output can produce the data 
fairly well.

I then realized that threading itself could be a piece of middleware -- 
you just have to do the proper buffering with input and output.  An 
intelligent application that realizes it can't run as an async process 
could install this middleware itself when necessary.

Even multiprocess could really be implemented as a piece of middleware, 
either running CGI scripts, or forking worker processes.  It could get 
out of hand if every application in a multi-application system had its 
own middleware; but the extension mechanism could also allow you to 
lazily implement these models, providing callbacks to access existing 
thread or worker process pools.

Another useful piece of middleware would be something for error 
reporting; it would basically pass everything through, but wrap 
everything in try:except:.  Then you could develop and plug a nice 
debugger into whatever architecture, and the basic server can do just 
the most minimal error logging (basically print a traceback to the error 
log).

Anyway, I'm pretty excited about the possibilities.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From pje at telecommunity.com  Thu Aug 19 07:28:30 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Aug 19 07:28:22 2004
Subject: [Web-SIG] WSGI uses
In-Reply-To: <412431AE.9050909@colorstudy.com>
Message-ID: <5.1.1.6.0.20040819012310.02bad720@mail.telecommunity.com>

At 11:50 PM 8/18/04 -0500, Ian Bicking wrote:
>I was playing around with making a WSGI server, and I'm starting to think 
>that some really neat stuff could be done with middleware.

Indeed.  In recent months, when I was refactoring peak.web to remove its 
dependency on Zope X3, I came up with a variant of the current WSGI 
interface as a strictly internal coupling mechanism for peak.web.  It 
turned out to be a delightfully simple way to connect internal 
components.  A peak.web user had previously complained about the difficulty 
of wrapping arbitrary postprocessing around pages, so I invented a more 
"functional" protocol for coupling peak.web components.  The main 
difference between it and WSGI now, was that it returned the status, 
headers, and iterator all in one tuple.  Tony Lownds proposed the write = 
start_response(status,headers) part, and I merged it with returning an 
iterator to form the new interface.  It's a reasonably lightweight API for 
both the server and application sides, but it's *remarkably* lightweight 
for middleware.

From fumanchu at amor.org  Thu Aug 19 10:56:44 2004
From: fumanchu at amor.org (Robert Brewer)
Date: Thu Aug 19 11:02:03 2004
Subject: [Web-SIG] The rewritten WSGI pre-PEP
Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022E0B@exchange.hqamor.amorhq.net>

> The ``environ`` dictionary is required to contain CGI environment
> variables, as defined by the Common Gateway Interface specification
> [2]_.  In addition, it must contain the following WSGI-defined
> variables:

> Finally, the ``environ`` dictionary may also contain server-defined
> variables.  These variables should be named using only lower-case
> letters, numbers, dots, and underscores, and should be prefixed with
> a name that is unique to the defining server or gateway.  For
> example, ``mod_python`` might define variables with names like
> ``mod_python.some_variable``.

I'm all for simplicity, but also for ubiquity; I'd like to see a
standard "uploads" entry in the environ dict. I'd really hate to see
environ['mod_python.uploaded_files'] which is different from, say,
environ['iis_asp.files_which_have_been_uploaded'] when they don't need
to be specialized. Example:

environ['uploads'] = {supplied_filename: read_func, ...}


mod_python, for example, would populate it via:

for param in (util.FieldStorage(req, 1).list or []):
    if param.filename:
        environ['uploads'][param.filename] = param.file.read
    else:
        # handle non-file param
        ...

Perhaps there are other candidates for standardized (but not required)
entries?


Robert Brewer
MIS
Amor Ministries
fumanchu@amor.org


* Introduction at the end: Hello, all. I'm fairly new to Python (~ 1
year). I just replaced the existing core business webapp at my company
(which I wrote in VB4 !) with a more enterprise-level Python one. So
I've rolled my own framework (templating, ORM, and multi-webserver), at
least once. ;) Oh, and I also wrote a wiki-like app to run on the same
framework.
From and-py at doxdesk.com  Thu Aug 19 15:16:08 2004
From: and-py at doxdesk.com (Andrew Clover)
Date: Thu Aug 19 15:15:32 2004
Subject: [Web-SIG] The rewritten WSGI pre-PEP
In-Reply-To: <3A81C87DC164034AA4E2DDFE11D258E3022E0B@exchange.hqamor.amorhq.net>
References: <3A81C87DC164034AA4E2DDFE11D258E3022E0B@exchange.hqamor.amorhq.net>
Message-ID: <4124A818.2070907@doxdesk.com>

Robert Brewer <fumanchu@amor.org> wrote:

> I'm all for simplicity, but also for ubiquity; I'd like to see a
> standard "uploads" entry in the environ dict.

I wouldn't! WSGI should not touch the HTTP request input stream, and 
definitely should not attempt to parse a form submission to get fields 
and file uploads out of it.

That's the job of the framework (or other form-reading package not 
necessarily part of a complete framework). There are multiple existing 
form-reading implementations with different ways of working.

-- 
Andrew Clover
mailto:and@doxdesk.com
http://www.doxdesk.com/
From fumanchu at amor.org  Thu Aug 19 16:35:21 2004
From: fumanchu at amor.org (Robert Brewer)
Date: Thu Aug 19 16:40:40 2004
Subject: [Web-SIG] The rewritten WSGI pre-PEP
Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022E0C@exchange.hqamor.amorhq.net>

Andrew Clover wrote:
> Robert Brewer <fumanchu@amor.org> wrote:
> 
> > I'm all for simplicity, but also for ubiquity; I'd like to see a
> > standard "uploads" entry in the environ dict.
> 
> I wouldn't! WSGI should not touch the HTTP request input stream, and 
> definitely should not attempt to parse a form submission to 
> get fields and file uploads out of it.
> 
> That's the job of the framework (or other form-reading package not 
> necessarily part of a complete framework). There are multiple 
> existing form-reading implementations with different ways of
> working.

Fair enough. That would really break chaining components, as well, now
that I think about it. Idea retracted.


Robert Brewer
MIS
Amor Ministries
fumanchu@amor.org
From tony at lownds.com  Thu Aug 19 21:37:01 2004
From: tony at lownds.com (tony@lownds.com)
Date: Thu Aug 19 21:52:30 2004
Subject: [Web-SIG] WSGI uses
In-Reply-To: <412431AE.9050909@colorstudy.com>
References: <412431AE.9050909@colorstudy.com>
Message-ID: <51261.204.162.121.54.1092944221.squirrel@*>

> For instance, I was thinking about setting up something for Medusa with
> WSGI.  But though I think asynchronous code seems like a good server
> architecture, I'm not that interested in it for applications.  But this
> iteration of the WSGI spec allows for async pretty well; you can tell
> you are in that situation when wsgi.multiprocess is false and
> wsgi.multithread is false, and the iterator output can produce the data
> fairly well.
>

How do you decide when to actually send the data back to the client? On
every yield?

That could perform badly if one does

def application(...):
  ...
  return open(filename)

...that usage is actually suggested in the spec.

In a similar vein, if servers/gateways send data back on every call to
write, and applications don't take that into account, they could also
suffer in performance. It seems like an object with write() and flush()
makes it easier to provide guarantees about streaming -- which I think
WSGI ought to do.

> I then realized that threading itself could be a piece of middleware --
> you just have to do the proper buffering with input and output.  An
> intelligent application that realizes it can't run as an async process
> could install this middleware itself when necessary.
>

Did you find that an async server has to provide a new buffer for every
request to implement the write() function correctly?

Although I suggested the (env, start_response) -> write() protocol, it
just can't adapt to future needs. As soon as more than one function/method
is needed, the API is broken -- and can't be fixed.

For instance, having one method to start the response and NOT get a
write() function could allow server/gateways avoid some work...

What about passing in a class with class methods in place of the
start_response method? i.e.

class ContextLogic:
    @classmethod
    def start_writing(cls, env, status, headers):
       cls.start(env, status, headers)
       # prepare output object
       return output.write

    @classmethod
    def start(cls, env, status, headers):
        ....

    @classmethod
    def request_url(cls, env):
        ...

    @classmethod
    def get_input_stream(cls, env):
       ...

Contexts can be re-used, and middleware does not have to delegate (it just
subclasses on the fly).

class Pooler:
    class PoolingLogicMixin:
         @classmethod
         def get_pool(cls, env):
             ...

    def __init__(self, subapp, ...):
        self.subapp = subapp

    def __call__(self, context, env):
        class NewContext(PoolingLogicMixin, context): pass
        return self.subapp(NewContext, env)

One more thought: how about using the term WSGI "driver" instead of
server/gateway?

-Tony


From fumanchu at amor.org  Thu Aug 19 22:25:55 2004
From: fumanchu at amor.org (Robert Brewer)
Date: Thu Aug 19 22:31:15 2004
Subject: [Web-SIG] WSGI uses
Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022E0D@exchange.hqamor.amorhq.net>

Tony Lownds wrote:
> One more thought: how about using the term WSGI "driver" instead of
> server/gateway?

+1. And "provider" or some word besides the extremely-overused
"application", which usually has already been used by any given
framework.


Robert Brewer
MIS
Amor Ministries
fumanchu@amor.org
From pje at telecommunity.com  Thu Aug 19 22:59:23 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Aug 19 22:59:15 2004
Subject: [Web-SIG] WSGI uses
In-Reply-To: <51261.204.162.121.54.1092944221.squirrel@*>
References: <412431AE.9050909@colorstudy.com> <412431AE.9050909@colorstudy.com>
Message-ID: <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com>

At 12:37 PM 8/19/04 -0700, tony@lownds.com wrote:
> > For instance, I was thinking about setting up something for Medusa with
> > WSGI.  But though I think asynchronous code seems like a good server
> > architecture, I'm not that interested in it for applications.  But this
> > iteration of the WSGI spec allows for async pretty well; you can tell
> > you are in that situation when wsgi.multiprocess is false and
> > wsgi.multithread is false, and the iterator output can produce the data
> > fairly well.
> >
>
>How do you decide when to actually send the data back to the client? On
>every yield?

That's up to the server to decide.


>In a similar vein, if servers/gateways send data back on every call to
>write, and applications don't take that into account, they could also
>suffer in performance. It seems like an object with write() and flush()
>makes it easier to provide guarantees about streaming -- which I think
>WSGI ought to do.

If you need to avoid creating data before the client is ready for it, you 
should use the async interface (yielding data) rather than the push 
interface (write() calls).  An asynchronous server should avoid moving the 
iterator forward when the outgoing socket isn't ready for data to be sent.


> > I then realized that threading itself could be a piece of middleware --
> > you just have to do the proper buffering with input and output.  An
> > intelligent application that realizes it can't run as an async process
> > could install this middleware itself when necessary.
> >
>
>Did you find that an async server has to provide a new buffer for every
>request to implement the write() function correctly?

I'm not sure I'm following either you or Ian here.


>Although I suggested the (env, start_response) -> write() protocol, it
>just can't adapt to future needs. As soon as more than one function/method
>is needed, the API is broken -- and can't be fixed.

Actually, there are several extension routes available, such as adding 
optional or keyword parameters to start_response() and write().


>For instance, having one method to start the response and NOT get a
>write() function could allow server/gateways avoid some work...

In order to be compliant, the server *must* support the write() facility, 
so there's no point to making it optional.


>What about passing in a class with class methods in place of the
>start_response method? i.e.
>
>class ContextLogic:
>     @classmethod
>     ....
>
>Contexts can be re-used, and middleware does not have to delegate (it just
>subclasses on the fly).

Interesting concept, although it means that servers would also be 
subclassing on-the-fly if they need per-request data on the 
context.  Though I suppose all methods could be required to take the 
environment as a parameter.

So far, I'm -0.5 on the idea, as I'd *really* like to keep the whole thing 
super-minimalistic.  Everything that expands the scope increases the range 
of ways that people can accidentally write implementations that don't 
interoperate.


>One more thought: how about using the term WSGI "driver" instead of
>server/gateway?

But servers and gateways are what they *are*.  They're not "drivers" in any 
sense that I understand, at least.

From tony at lownds.com  Thu Aug 19 23:33:59 2004
From: tony at lownds.com (tony@lownds.com)
Date: Thu Aug 19 23:49:28 2004
Subject: [Web-SIG] WSGI uses
In-Reply-To: <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com>
References: <412431AE.9050909@colorstudy.com><412431AE.9050909@colorstudy.com>
	<5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com>
Message-ID: <52525.204.162.121.54.1092951239.squirrel@*>

[Phillip]
> If you need to avoid creating data before the client is ready for it, you
> should use the async interface (yielding data) rather than the push
> interface (write() calls).  An asynchronous server should avoid moving the
> iterator forward when the outgoing socket isn't ready for data to be sent.
>

The use case I had in mind was the application sending a partial response,
then doing a lot of work, then sending the rest of the response. I guess
you are saying that WSGI apps shouldn't use write() in that case. I wonder
when they should use write() then. If it's a second class citizen to the
iterator why not force all applications to provide their own buffering?

>>Although I suggested the (env, start_response) -> write() protocol, it
>>just can't adapt to future needs. As soon as more than one
>> function/method
>>is needed, the API is broken -- and can't be fixed.
>
> Actually, there are several extension routes available, such as adding
> optional or keyword parameters to start_response() and write().
>

True. Thats less than elegant though.

>>What about passing in a class with class methods in place of the
>>start_response method? i.e.
>
> Interesting concept, although it means that servers would also be
> subclassing on-the-fly if they need per-request data on the
> context.  Though I suppose all methods could be required to take the
> environment as a parameter.
>

Yes, all methods would need take env.

> So far, I'm -0.5 on the idea, as I'd *really* like to keep the whole thing
> super-minimalistic.  Everything that expands the scope increases the range
> of ways that people can accidentally write implementations that don't
> interoperate.
>

It just seems too minimal. It's hard to see how a server could cleanly
implement a more powerful API than WSGI 1.0 and still be backwards
compatible with apps/frameworks that use the WSGI 1.0 interface.

>
>>One more thought: how about using the term WSGI "driver" instead of
>>server/gateway?
>
> But servers and gateways are what they *are*.  They're not "drivers" in
> any
> sense that I understand, at least.
>

The way I see it, server is apache, or mod_python -- there would be a
piece of code that implements the WSGI interface on top of the server.
That's the driver.

-Tony

From pje at telecommunity.com  Thu Aug 19 23:59:17 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Aug 19 23:59:09 2004
Subject: [Web-SIG] WSGI uses
In-Reply-To: <52525.204.162.121.54.1092951239.squirrel@*>
References: <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com>
	<412431AE.9050909@colorstudy.com> <412431AE.9050909@colorstudy.com>
	<5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com>

At 02:33 PM 8/19/04 -0700, tony@lownds.com wrote:
>[Phillip]
> > If you need to avoid creating data before the client is ready for it, you
> > should use the async interface (yielding data) rather than the push
> > interface (write() calls).  An asynchronous server should avoid moving the
> > iterator forward when the outgoing socket isn't ready for data to be sent.
> >
>
>The use case I had in mind was the application sending a partial response,
>then doing a lot of work, then sending the rest of the response. I guess
>you are saying that WSGI apps shouldn't use write() in that case. I wonder
>when they should use write() then. If it's a second class citizen to the
>iterator why not force all applications to provide their own buffering?

I don't see what you mean about buffering.  As for it being a second-class 
citizen, it certainly is.  But application frameworks that currently 
provide some analogue to write() in their API today, can't live without 
it.  So, the write() functionality has to be there or WSGI is DOA.


> > So far, I'm -0.5 on the idea, as I'd *really* like to keep the whole thing
> > super-minimalistic.  Everything that expands the scope increases the range
> > of ways that people can accidentally write implementations that don't
> > interoperate.
> >
>
>It just seems too minimal. It's hard to see how a server could cleanly
>implement a more powerful API than WSGI 1.0 and still be backwards
>compatible with apps/frameworks that use the WSGI 1.0 interface.

What would this "more powerful API" consist of?  WSGI is a paper-thin 
abstraction of HTTP; that's its sole purpose.


> > But servers and gateways are what they *are*.  They're not "drivers" in
> > any
> > sense that I understand, at least.
> >
>
>The way I see it, server is apache, or mod_python -- there would be a
>piece of code that implements the WSGI interface on top of the server.
>That's the driver.

I've written a prototype WSGI server based on the previous draft: all it 
does is serve WSGI apps, so there's no "driver" involved.  I expect there 
will be other such fully-integrated servers.  A CGI-based gateway also 
isn't *part* of the server it runs under, so it's a gateway, not a 
driver.  Thus, it seems to me there are only servers and gateways.  That 
some gateways may be implemented as a driver within a server seems like 
obscuring the *purpose* of the API (allowing an application to run in a 
server or a gateway thereto) in favor of an implementation detail that 
doesn't even always apply.

From tony at lownds.com  Fri Aug 20 01:35:30 2004
From: tony at lownds.com (tony@lownds.com)
Date: Fri Aug 20 01:51:00 2004
Subject: [Web-SIG] WSGI uses
In-Reply-To: <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com>
References: <5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com><412431AE.9050909@colorstudy.com><412431AE.9050909@colorstudy.com><5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com>
	<5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com>
Message-ID: <53474.204.162.121.54.1092958530.squirrel@*>

>> If it's a second class citizen to the
>>iterator why not force all applications to provide their own buffering?
>
> I don't see what you mean about buffering.

s/buffering/write()/

An application framework can provide its own write() very easily

def app_framework(env, start_response):
  start_response(...)
  buffer = []
  write = buffer.append
  ...
  return buffer


> As for it being a second-class
> citizen, it certainly is.  But application frameworks that currently
> provide some analogue to write() in their API today, can't live without
> it.  So, the write() functionality has to be there or WSGI is DOA.
>
>

Ok

> What would this "more powerful API" consist of?  WSGI is a paper-thin
> abstraction of HTTP; that's its sole purpose.
>

One example: redirecting to a resource internal to the server (like
Location: for CGI)

I suppose you could use a specific status code or a header.

> Thus, it seems to me there are only servers and gateways.

Ok

-Tony

From pje at telecommunity.com  Fri Aug 20 18:04:12 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Aug 20 18:04:09 2004
Subject: [Web-SIG] Write buffering (was Re: WSGI uses)
In-Reply-To: <53474.204.162.121.54.1092958530.squirrel@*>
References: <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com>
	<5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com>
	<412431AE.9050909@colorstudy.com> <412431AE.9050909@colorstudy.com>
	<5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com>
	<5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com>

At 04:35 PM 8/19/04 -0700, tony@lownds.com wrote:
> >> If it's a second class citizen to the
> >>iterator why not force all applications to provide their own buffering?
> >
> > I don't see what you mean about buffering.
>
>s/buffering/write()/
>
>An application framework can provide its own write() very easily
>
>def app_framework(env, start_response):
>   start_response(...)
>   buffer = []
>   write = buffer.append
>   ...
>   return buffer

Ah.  But that doesn't allow *streaming* writes.  The specific use case I 
had in mind for using your write() idea was to allow frameworks that 
currently allow streaming writes as a function/method invocation during the 
request execution, to still work under WSGI.

In effect, 'write()' is a backward compatibility mechanism for existing 
code that expects to be able to stream data to the client during request 
execution, and is not currently written in the form of an 
iterator/producer.  (It's also an acceptable mechanism for small/fast 
requests, and frameworks that normally buffer their I/O and would only call 
'write()' once anyway.)

Still, your comments have illustrated to me that there does need to be 
better definition of how flushing is expected to occur, although there is 
only one use case I can think of for it.  Specifically, the only time an 
application needs to ensure that all its pending output has been sent to 
the client, is when it is about to perform some lengthy calculation and is 
using "server push" to display a "please wait" screen before returning the 
real result.  In this case, if I/O is single-threaded (i.e. only happens 
when write() calls are made), and write() isn't guaranteed to be flushed 
(e.g. it's buffered and sent in blocks), then the application would need to 
have a way to say, "no, really, please send it *now*."

On the other hand, if I/O is single-threaded in that fashion, then the 
server should be required to finish every write() before the write() call 
returns.  The write() function should only be allowed to buffer the data if 
another thread is emptying the buffer continuously.

I'll add this to the spec, unless anybody knows of any other use cases for 
either buffering or not buffering writes.

From ianb at colorstudy.com  Fri Aug 20 18:13:21 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Aug 20 18:14:40 2004
Subject: [Web-SIG] Write buffering (was Re: WSGI uses)
In-Reply-To: <5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com>
References: <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com>
	<5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com>
	<412431AE.9050909@colorstudy.com> <412431AE.9050909@colorstudy.com>
	<5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com>
	<5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com>
	<5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com>
Message-ID: <41262321.5080306@colorstudy.com>

Phillip J. Eby wrote:
> Still, your comments have illustrated to me that there does need to be 
> better definition of how flushing is expected to occur, although there 
> is only one use case I can think of for it.  Specifically, the only time 
> an application needs to ensure that all its pending output has been sent 
> to the client, is when it is about to perform some lengthy calculation 
> and is using "server push" to display a "please wait" screen before 
> returning the real result.  In this case, if I/O is single-threaded 
> (i.e. only happens when write() calls are made), and write() isn't 
> guaranteed to be flushed (e.g. it's buffered and sent in blocks), then 
> the application would need to have a way to say, "no, really, please 
> send it *now*."

I some environments (e.g., CGI) I don't believe there's any way to 
ensure that the data gets sent immediately.  The buffering is rather 
opaque in those cases.  So all we can do is try, we can't really 
guarantee that data will be sent.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From pje at telecommunity.com  Fri Aug 20 18:37:54 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Aug 20 18:37:46 2004
Subject: [Web-SIG] Write buffering (was Re: WSGI uses)
In-Reply-To: <41262321.5080306@colorstudy.com>
References: <5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com>
	<5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com>
	<5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com>
	<412431AE.9050909@colorstudy.com> <412431AE.9050909@colorstudy.com>
	<5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com>
	<5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com>
	<5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040820123459.01f20b20@mail.telecommunity.com>

At 11:13 AM 8/20/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>Still, your comments have illustrated to me that there does need to be 
>>better definition of how flushing is expected to occur, although there is 
>>only one use case I can think of for it.  Specifically, the only time an 
>>application needs to ensure that all its pending output has been sent to 
>>the client, is when it is about to perform some lengthy calculation and 
>>is using "server push" to display a "please wait" screen before returning 
>>the real result.  In this case, if I/O is single-threaded (i.e. only 
>>happens when write() calls are made), and write() isn't guaranteed to be 
>>flushed (e.g. it's buffered and sent in blocks), then the application 
>>would need to have a way to say, "no, really, please send it *now*."
>
>I some environments (e.g., CGI) I don't believe there's any way to ensure 
>that the data gets sent immediately.  The buffering is rather opaque in 
>those cases.  So all we can do is try, we can't really guarantee that data 
>will be sent.

True enough.  I've not seen a problem with CGI myself, but I believe some 
CGI-based protocols (not FastCGI, but some clones of it) buffer entire 
requests no matter what you do.  WSGI servers or gateways that can't do 
streaming should document that fact.  Or perhaps there should be two 
compliance levels: WSGI Basic and WSGI Streaming.

From ianb at colorstudy.com  Fri Aug 20 18:43:10 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Aug 20 18:44:22 2004
Subject: [Web-SIG] Write buffering (was Re: WSGI uses)
In-Reply-To: <5.1.1.6.0.20040820123459.01f20b20@mail.telecommunity.com>
References: <5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com>
	<5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com>
	<5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com>
	<412431AE.9050909@colorstudy.com> <412431AE.9050909@colorstudy.com>
	<5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com>
	<5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com>
	<5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com>
	<5.1.1.6.0.20040820123459.01f20b20@mail.telecommunity.com>
Message-ID: <41262A1E.5020508@colorstudy.com>

Phillip J. Eby wrote:
> True enough.  I've not seen a problem with CGI myself, but I believe 
> some CGI-based protocols (not FastCGI, but some clones of it) buffer 
> entire requests no matter what you do.  WSGI servers or gateways that 
> can't do streaming should document that fact.  Or perhaps there should 
> be two compliance levels: WSGI Basic and WSGI Streaming.

It could just be something like a 'wsgi.streaming' key in the 
environment, no?  Gateways should be encouraged to set that to false 
until they've confirmed that streaming really works consistently.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From tony at lownds.com  Fri Aug 20 18:55:40 2004
From: tony at lownds.com (tony@lownds.com)
Date: Fri Aug 20 19:11:23 2004
Subject: [Web-SIG] Write buffering (was Re: WSGI uses)
In-Reply-To: <5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com>
References: <5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com><5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com><412431AE.9050909@colorstudy.com><412431AE.9050909@colorstudy.com><5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com><5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com>
	<5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com>
Message-ID: <60553.68.122.70.234.1093020940.squirrel@*>

> Still, your comments have illustrated to me that there does need to be
> better definition of how flushing is expected to occur

Thanks for deciphering those comments. This is what I was hoping for.

>... the application would need
> to
> have a way to say, "no, really, please send it *now*."
>

Are you considering requiring start_response() return an object with
.write() and .flush() methods?

-Tony

From pje at telecommunity.com  Fri Aug 20 19:21:05 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Aug 20 19:20:56 2004
Subject: [Web-SIG] Write buffering (was Re: WSGI uses)
In-Reply-To: <60553.68.122.70.234.1093020940.squirrel@*>
References: <5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com>
	<5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com>
	<5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com>
	<412431AE.9050909@colorstudy.com> <412431AE.9050909@colorstudy.com>
	<5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com>
	<5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com>
	<5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040820131610.02bbcec0@mail.telecommunity.com>

At 09:55 AM 8/20/04 -0700, tony@lownds.com wrote:
> > Still, your comments have illustrated to me that there does need to be
> > better definition of how flushing is expected to occur
>
>Thanks for deciphering those comments. This is what I was hoping for.
>
> >... the application would need
> > to
> > have a way to say, "no, really, please send it *now*."
> >
>
>Are you considering requiring start_response() return an object with
>.write() and .flush() methods?

No, I'm suggesting that write() should be guaranteed to either:

   1) Flush all output before returning, or
   2) Put data in a buffer that will be emptied by another thread or by the 
operating system

To be a conforming implementation, a server/gateway must do one or the 
other.  The same rules should apply for data yielded by a returned 
iterator, i.e. the data must be sent or buffered for continuous sending 
before the iterator's next() method is called again.

From pje at telecommunity.com  Sat Aug 21 00:42:53 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat Aug 21 00:42:48 2004
Subject: [Web-SIG] Write buffering (was Re: WSGI uses)
In-Reply-To: <41262A1E.5020508@colorstudy.com>
References: <5.1.1.6.0.20040820123459.01f20b20@mail.telecommunity.com>
	<5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com>
	<5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com>
	<5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com>
	<412431AE.9050909@colorstudy.com> <412431AE.9050909@colorstudy.com>
	<5.1.1.6.0.20040819164422.0268bec0@mail.telecommunity.com>
	<5.1.1.6.0.20040819175043.02d693e0@mail.telecommunity.com>
	<5.1.1.6.0.20040820114448.03c404e0@mail.telecommunity.com>
	<5.1.1.6.0.20040820123459.01f20b20@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040820183539.02964050@mail.telecommunity.com>

At 11:43 AM 8/20/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>True enough.  I've not seen a problem with CGI myself, but I believe some 
>>CGI-based protocols (not FastCGI, but some clones of it) buffer entire 
>>requests no matter what you do.  WSGI servers or gateways that can't do 
>>streaming should document that fact.  Or perhaps there should be two 
>>compliance levels: WSGI Basic and WSGI Streaming.
>
>It could just be something like a 'wsgi.streaming' key in the environment, 
>no?  Gateways should be encouraged to set that to false until they've 
>confirmed that streaming really works consistently.

I think I've convinced myself that servers or gateways must *always* 
attempt to stream data passed to write() or yielded by the iterator.  The 
only time this can cause any problems is if the application sends lots of 
small strings, and the I/O is single-threaded and unbuffered.  As a 
practical matter, TCP/IP stacks usually have at least a K or two of 
outbound buffering for a connection, don't they?  So until that fills up, 
the application will continue to execute normally.  It's not as good as 
something better, but it'll do.

So, I'm updating the spec and recommending that applications do buffering 
of their own for "moderately sized" responses that are neither too large 
for buffering nor too small to worry about it.  I know that Zope, for 
example, normally generates its body output as one big block anyway, and I 
think the common pattern for e.g. Python page templating systems is to 
produce their output as a single string, rather than in pieces.  So, it 
would seem in common cases that there will only be one write() call 
anyway.  (Especially since buffered dynamic output gives an application an 
error-handling advantage: it can send an error page rather than dumping 
error garbage into the middle of a partially-completed response.)

From pje at telecommunity.com  Sat Aug 21 01:20:49 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat Aug 21 01:42:15 2004
Subject: [Web-SIG] Latest WSGI Draft
Message-ID: <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>

Once again, please pardon me if I missed an update, and gently remind me 
with a clue by four if need be.  :)  Or better yet, by supplying a patch 
implementing your suggested changes.  :)

I was going to post a diff, but even a unified diff is about as long as the 
previous version was, and the new draft is almost 50% longer than the old 
one, as lots of new material has been added about streaming, URL 
determination, required CGI variables, etc. etc.  There's even some extra 
material in the Rationale and Goals about using WSGI middleware to better 
modularize
frameworks, allowing more mix-and-match between them.

I think this is just about ready to submit as an official PEP, get a 
numbering, and post to c.l.py and Python-Dev, but of course I could be 
wrong.  Your feedback is appreciated.


PEP: XXX
Title: Python Web Server Gateway Interface v1.0
Version: $Revision: 1.1 $
Last-Modified: $Date: 2004/08/20 19:11:27 $
Author: Phillip J. Eby <pje@telecommunity.com>
Discussions-To: Python Web-SIG <web-sig@python.org>
Status: Draft
Type: Informational
Content-Type: text/x-rst
Created: 07-Dec-2003
Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004


Abstract
========

This document specifies a proposed standard interface between web
servers and Python web applications or frameworks, to promote
web application portability across a variety of web servers.


Rationale and Goals
===================

Python currently boasts a wide variety of web application
frameworks, such as Zope, Quixote, Webware, SkunkWeb, PSO,
and Twisted Web -- to name just a few [1]_.  This wide variety
of choices can be a problem for new Python users, because
generally speaking, their choice of web framework will limit
their choice of usable web servers, and vice versa.

By contrast, although Java has just as many web application
frameworks available, Java's "servlet" API makes it possible
for applications written with any Java web application framework
to run in any web server that supports the servlet API.

The availability and widespread use of such an API in web
servers for Python -- whether those servers are written in
Python (e.g. Medusa), embed Python (e.g. mod_python), or
invoke Python via a gateway protocol (e.g. CGI, FastCGI,
etc.) -- would separate choice of framework from choice
of web server, freeing users to choose a pairing that suits
them, while freeing framework and server developers to focus
on their area of specialty.

This PEP, therefore, proposes a simple and universal interface
between web servers and web applications or frameworks: the
Python Web Server Gateway Interface (WSGI).

But the mere existence of a WSGI spec does nothing to address the
existing state of servers and frameworks for Python web applications.
Server and framework authors and maintainers must actually implement
WSGI for there to be any effect.

However, since no existing servers or frameworks support WSGI, there
is little immediate reward for an author who implements WSGI support.
Thus, WSGI *must* be easy to implement, so that an author's initial
investment in the interface can be reasonably low.

Thus, simplicity of implementation on *both* the server and framework
sides of the interface is absolutely critical to the utility of the
WSGI interface, and is therefore the principal criterion for any
design decisions.

Note, however, that simplicity of implementation for a framework
author is not the same thing as ease of use for a web application
author.  WSGI presents an absolutely "no frills" interface to the
framework author, because bells and whistles like response objects
and cookie handling would just get in the way of existing frameworks'
handling of these issues.  Again, the goal of WSGI is to facilitate
easy interconnection of existing servers and applications or
frameworks, not to create a new web framework.

Note also that this goal precludes WSGI from requiring anything that
is not already available in deployed versions of Python.  Therefore,
new standard library modules are not proposed or required by this
specification, and nothing in WSGI requires a Python version greater
than 1.5.2.  (It would be a good idea, however, for future versions
of Python to include support for this interface in web servers
provided by the standard library.)

In addition to ease of implementation for existing and future
frameworks and servers, it should also be easy to create request
preprocessors, response postprocessors, and other WSGI-based
"middleware" components that look like an application to their
containing server, while acting as a server for their contained
applications.

If middleware can be both simple and robust, and WSGI is widely
available in servers and frameworks, it allows for the possibility
of an entirely new kind of Python web application framework: one
consisting of loosely-coupled WSGI middleware components.  Indeed,
existing framework authors may even choose to refactor their
frameworks' existing services to be provided in this way, becoming
more like libraries used with WSGI, and less like monolithic
frameworks.  This would then allow application developers to choose
"best-of-breed" components for specific functionality, rather than
having to commit to all the pros and cons of a single framework.

Of course, as of this writing, that day is doubtless quite far off.
In the meantime, it is a sufficient short-term goal for WSGI to
enable the use of any framework with any server.

Finally, it should be mentioned that the current version of WSGI
does not prescribe any particular mechanism for "deploying" an
application for use with a web server or server gateway.  At the
present time, this is necessarily implementation-defined by the
server or gateway.  After a sufficient number of servers and
frameworks have implemented WSGI to provide field experience with
varying deployment requirements, it may make sense to create
another PEP, describing a deployment standard for WSGI servers and
application frameworks.


Specification Overview
======================

The WSGI interface has two sides: the "server" or "gateway" side,
and the "application" side.  The server side invokes a callable
object that is provided by the application side.  The specifics
of how that object is provided are up to the server or gateway.
It is assumed that some servers or gateways will require an
application's deployer to write a short script to create an
instance of the server or gateway, and supply it with the
application object.  Other servers and gateways may use
configuration files or other mechanisms to specify where the
application object should be imported from.

The application object is simply a callable object that accepts
two arguments.  The term "object" should not be misconstrued as
requiring an actual object instance: a function, method, class,
or instance with a ``__call__`` method are all acceptable for
use as an application object.  Here are two example application
objects; one is a function, and the other is a class::

     def simple_app(environ, start_response):
         """Simplest possible application object"""
         status = '200 OK'
         headers = [('Content-type','text/plain')]
         write = start_response(status, headers)
         write('Hello world!\n')


     class AppClass:
         """Much the same thing, but as a class"""

         def __init__(self, environ, start_response):
             self.environ = environ
             self.start = start_response

         def __iter__(self):
             status = '200 OK'
             headers = [('Content-type','text/plain')]
             self.start(status, headers)

             yield "Hello world!\n"
             for i in range(1,11):
                 yield "Extra line %s\n" % i


The server or gateway invokes the application once for each request
it receives from a web browser.  To illustrate, here is a simple
CGI gateway, implemented as a function taking an application object
(all error handling omitted)::

     import os, sys

     def run_with_cgi(application):

         environ = {}
         environ.update(os.environ)
         environ['wsgi.input']        = sys.stdin
         environ['wsgi.errors']       = sys.stderr
         environ['wsgi.version']      = '1.0'
         environ['wsgi.multithread']  = False
         environ['wsgi.multiprocess'] = True
         environ['wsgi.last_call']    = True

         def start_response(status,headers):
             print "Status:", status
             for key,val in headers:
                 print "%s: %s" % (key,val)
             return sys.stdout.write

         result = application(environ, start_response)
         if result:
             try:
                 for data in result:
                     sys.stdout.write(data)
             finally:
                 if hasattr(result,'close'):
                     result.close()

In the next section, we will specify the precise semantics that
these illustrations are examples of.


Specification Details
=====================

The application object must accept two positional arguments.  For
the sake of illustration, we have named them ``environ``, and
``start_response``, but they are not required to have these names.
A server or gateway *must* invoke the application object using
positional (not keyword) arguments.

The first parameter is a dictionary object, containing CGI-style
environment variables.  This object *must* be a builtin Python
dictionary (*not* a subclass, ``UserDict`` or other dictionary
emulation), and the application is allowed to modify the dictionary
in any way it desires.  The dictionary must also include certain
WSGI-required variables (described in a later section), and may
also include server-specific extension variables, named according
to a convention that will be described below.

The second parameter is a callable accepting two positional
arguments: a status string of the form ``"999 Message here"``,
and a list of ``(header_name,header_value)`` tuples describing the
HTTP response header.  This callable must return another callable
that takes one parameter: a string to write as part of the HTTP
response body.

The application object may return either ``None`` (indicating that
there is no additional output), or it may return a non-empty
iterable yielding strings.  (For example, it could be a
generator-iterator that yields strings, or it could be a
sequence such as a list of strings.)  The server or gateway will
treat the strings yielded by the iterable as if they had been
passed to the ``write()`` method.

Also, if the application returns an iterable, and the iterable has a
``close()`` method, the server or gateway *must* call that method
upon completion of the current request, whether the request was
completed normally, or terminated early due to an error.  (This is to
support resource release by the application.  This protocol is
intended to support PEP 325, and also the simple case of an
application returning an open text file.)


``environ`` Variables
---------------------

The ``environ`` dictionary is required to contain these CGI environment
variables, as defined by the Common Gateway Interface specification
[2]_.  The following variables *must* be present, but *may* be an empty
string, if there is no more appropriate value for them:

  * ``REQUEST_METHOD``

  * ``SCRIPT_NAME`` (The initial portion of the request URL's "path" that
    corresponds to the application object, so that the application knows
    its virtual "location".)

  * ``PATH_INFO`` (The remainder of the request URL's "path", designating
     the virtual "location" of the request's target within the application)

  * ``QUERY_STRING``

  * ``CONTENT_TYPE``

  * ``CONTENT_LENGTH``

  * ``SERVER_NAME`` and ``SERVER_PORT`` (which, when combined with
    ``SCRIPT_NAME`` and ``PATH_INFO``, should complete the

  * Variables corresponding to the client-supplied HTTP headers (i.e.,
    variables whose names begin with ``"HTTP_"``).

In general, a server or gateway should attempt to provide as many
other CGI variables as are applicable, including e.g. the nonstandard
SSL variables such as ``HTTPS=on``, if an SSL connection is in effect.
However, an application that uses any variables other than the ones
listed above are necessarily non-portable to web servers that do not
support the relevant extensions.

A WSGI-compliant server or gateway *should* document what variables
it provides, along with their definitions as appropriate.  Applications
*should* check for the presence of any nonstandard variables they
require, and have a fallback plan in the event such a variable is
absent.

Note: missing variables (such as ``REMOTE_USER`` when no
authentication has occurred) should be left out of the ``environ``
dictionary.  Also note that CGI-defined variables must be strings,
if they are present at all.  It is a violation of this specification
for a CGI variable's value to be of any type other than ``str``.

In addition to the CGI-defined variables, the ``environ`` dictionary
must also contain the following WSGI-defined variables:

=====================  ==============================================
Variable               Value
=====================  ==============================================
``wsgi.version``       The tuple ``(1,0)``, representing WSGI
                        version 1.0.

``wsgi.input``         An input stream from which the HTTP request
                        body can be read.

``wsgi.errors``        An output stream to which error output can
                        be written.  For most servers, this will be
                        the server's error log.

``wsgi.multithread``   This value should be true if the application
                        object may be simultaneously invoked by
                        another thread in the same process, and
                        false otherwise.

``wsgi.multiprocess``  This value should be true if an equivalent
                        application object may be simultaneously
                        invoked by another process, and false
                        otherwise.

``wsgi.last_call``     This value should be true if this is expected
                        to be the last invocation of the application
                        in this process.  This is provided to allow
                        applications to optimize their setup for
                        long-running vs. short-running scenarios.
                        This flag should normally only be true for
                        CGI applications, or while a server is doing
                        some kind of "graceful shutdown".  Note that
                        a server or gateway is still allowed to invoke
                        the application again; this flag is only
                        a "suggestion" to the application that it is
                        unlikely to be reinvoked.
=====================  ==============================================

Finally, the ``environ`` dictionary may also contain server-defined
variables.  These variables should be named using only lower-case
letters, numbers, dots, and underscores, and should be prefixed with
a name that is unique to the defining server or gateway.  For
example, ``mod_python`` might define variables with names like
``mod_python.some_variable``.  This naming convention allows
"middleware" components to safely filter out extensions that they
do not understand.  (E.g. by deleting all keys from ``environ`` that
are all-lowercase and do not begin with ``"wsgi."``.)


Input and Error Streams
~~~~~~~~~~~~~~~~~~~~~~~

The input and error streams provided by the server must support
the following methods:

===================  ==========  ========
Method               Files       Notes
===================  ==========  ========
``read(size)``       ``input``
``readline()``       ``input``   1
``readlines(hint)``  ``input``   2
``__iter__()``       ``input``
``flush()``          ``errors``  3
``write(str)``       ``errors``
``writelines(seq)``  ``errors``
===================  ==========  ========

The semantics of each method are as documented in the Python Library
Reference, except for these notes as listed in the table above:

1. The optional "size" argument to ``readline()`` is not supported,
    as it may be complex for server authors to implement, and is not
    often used in practice.

2. Note that the ``hint`` argument to ``readlines()`` is optional for
    both caller and implementer.  The application is free not to
    supply it, and the server or gateway is free to ignore it.

3. Since the ``errors`` stream may not be rewound, a container is
    free to forward write operations immediately, without buffering.
    In this case, the ``flush()`` method may be a no-op.  Portable
    applications, however, cannot assume that output is unbuffered
    or that ``flush()`` is a no-op.  They must call ``flush()`` if
    they need to ensure that output has in fact been written.  (For
    example, to minimize intermingling of data from multiple processes
    writing to the same error log.

The methods listed in the table above *must* be supported by all
servers conforming to this specification.  Applications conforming
to this specification *must not* use any other methods or attributes
of the ``input`` or ``errors`` objects.  In particular, applications
*must not* attempt to close these streams, even if they possess
``close()`` methods.


The ``start_response()`` Callable
---------------------------------

The second parameter passed to the application object is itself a
two-argument callable, used to begin the HTTP response and return
a ``write()`` callable.  The first parameter the ``start_response()``
callable takes is a "status" string, of the form ``"999 Message here"``,
where ``999`` is replaced with the HTTP status code, and ``Message here``
is replaced with the appropriate message text.  The string *must* be
pure 7-bit ASCII, containing no control characters.  In particular,
it must not be terminated with a carriage return or linefeed.

The second parameter accepted by the ``start_response()`` callable
must be a sequence of ``(header_name,header_value)`` tuples.  Each
``header_name`` must be a valid HTTP header name, without a
trailing colon or other punctuation.  Each ``header_value``
*must not* include carriage returns or linefeeds: it should be a raw
*unfolded* header value.  If the HTTP spec calls for folding of a
particular header, the server shall be responsible for performing the
folding.  (These requirements are to minimize the complexity of parsing
required by servers, gateways, and intermediate response processors
that need to inspect or modify response headers.)

In general, the server or gateway is responsible for ensuring that
correct headers are sent to the client: if the application omits
a needed header, the server or gateway *should* add it.  For example,
the HTTP ``Date:`` and ``Server:`` headers would normally be supplied
by the server or gateway.  If the application supplies a header that
the server would ordinarily supply, or that contradicts the server's
intended behavior (e.g. supplying a different ``Connection:`` header),
the server or gateway *may* discard the conflicting header, provided
that its action is recorded for the benefit of the application author.


The ``write()`` Callable
------------------------

The return value of the ``start_response()`` callable is a one-argument
`write()`` callable, that accepts strings to write as part of the
HTTP response body.

Note that the purpose of the ``write()`` callable is primarily to
support existing application frameworks that support a streaming "push"
API.  Therefore, strings passed to ``write()`` *must* be sent to the
client *as soon as possible*; they must *not* be buffered unless the
buffer will be emptied in parallel with the application's continuing
execution (e.g. by a separate I/O thread).  If the server or gateway
does not have a separate I/O thread available, it *must* finish
writing the supplied string before it returns from each ``write()``
invocation.

If the application returns an iterable, each string produced by the
iterable must be treated as though it had been passed to ``write()``,
with the data sent in an "as soon as possible" manner.  That is,
the iterable should not be asked for a new string until the previous
string has been sent to the client, or is buffered for such sending
by a parallel thread.

Notice that these rules discourage the generation of content before a
client is ready for it, in excess of the buffer sizes provided by the
server and operating system.  For this reason, some applications may
wish to buffer data internally before passing any of it to ``write()``
or yielding it from an iterator, in order to avoid waiting for the
client to catch up with their output.  This approach may yield better
throughput for dynamically generated pages of moderate size, since the
application is then freed for other tasks.

In addition to improved performance, buffering all of an application's
output has an advantage for error handling: the buffered output can
be thrown away and replaced by an error page, rather than dumping an
error message in the middle of some partially-completed output.  For
this and other reasons, many existing Python frameworks already
accumulate their output for a single write, unless the application
explicitly requests streaming, or the expected output is larger than
practical for buffering (e.g. multi-megabyte PDFs).  So, these
application frameworks are already a natural fit for the WSGI
streaming model: for most requests they will only call ``write()``
once anyway!


Implementation/Application Notes
================================


Unicode
-------

HTTP does not directly support Unicode, and neither does this
interface.  All encoding/decoding must be handled by the application;
all strings and streams passed to or from the server must be standard
Python byte strings, not Unicode objects.  The result of using a
Unicode object where a string object is required, is undefined.


Multiple Invocations
--------------------

Application objects must be able to be invoked more than once, since
virtually all servers/gateways will make such requests.


Error Handling
--------------

Servers *should* trap and log exceptions raised by
applications, and *may* continue to execute, or attempt to shut down
gracefully.  Applications *should* avoid allowing exceptions to
escape their execution scope, since the result of uncaught exceptions
is server-defined.


Thread Support
--------------

Thread support, or lack thereof, is also server-dependent.
Servers that can run multiple requests in parallel, *should* also
provide the option of running an application in a single-threaded
fashion, so that applications or frameworks that are not thread-safe
may still be used with that server.


URL Reconstruction
------------------

If an application wishes to reconstruct a request's complete URL,
it may do so using the following algorithm, contributed by Ian
Bicking::

     if environ.get('HTTPS') == 'on':
         url = 'https://'
     else:
         url = 'http://'

     if environ.get('HTTP_HOST'):
         url += environ['HTTP_HOST']
     else:
         url += environ['SERVER_NAME']

     if environ.get('HTTPS') == 'on':
         if environ['SERVER_PORT'] != '443'
            url += ':' + environ['SERVER_PORT']
     else:
         if environ['SERVER_PORT'] != '80':
            url += ':' + environ['SERVER_PORT']

     url += environ['SCRIPT_NAME']
     url += environ['PATH_INFO']
     if environ.get('QUERY_STRING'):
         url += '?' + environ['QUERY_STRING']

Note that such a reconstructed URL may not be precisely the
same URI as requested by the client.  Server rewrite rules, for
example, may have modified the client's originally requested URL
to place it in a canonical form.


Application Configuration
-------------------------

This specification does not define how a server selects or
obtains an application to invoke.  These and other configuration
options are highly server-specific matters.  It is expected that
server/gateway authors will document how to configure the server to
execute a particular application object, and with what options (such
as threading options).

Framework authors, on the other hand, should document how to create
an application object that wraps their framework's functionality.
The user, who has chosen both the server and the application
framework, must connect the two together.  However, since both the
framework and the server now have a common interface, this should
be merely a mechanical matter, rather than a significant engineering
effort for each new server/framework pair.


Middleware
----------

Note that a single object may play the role of a server with respect
to some application(s), while also acting as an application with
respect to some server(s).  Such "middleware" components can perform
such functions as:

   * Routing a request to different application objects based on the
     target URL, after rewriting the ``environ`` accordingly.

   * Allowing multiple applications or frameworks to run side-by-side
     in the same process

   * Load balancing and remote processing, by forwarding requests and
     responses over a network

   * Perform content postprocessing, such as applying XSL stylesheets

Given the existence of applications and servers conforming to this
specification, the appearance of such reusable middleware becomes
a possibility.

Middleware components that transform the request or response data
should in general remove WSGI extension data from the ``environ``
that the middleware does not understand, to prevent applications
from inadvertently bypassing the middleware's mediation of the
interaction by use of a server extension.  The simplest way to do
this is to just delete keys from ``environ`` that are all lowercase
and do not begin with ``"wsgi."``, before passing the ``environ``
on to the application.


HTTP 1.1 Expect/Continue
------------------------

Servers and gateways *must* provide transparent support for HTTP 1.1's
"expect/continue" mechanism, if they implement HTTP 1.1.  This may be
done in any of several ways:

  1. Reject all client requests containing an ``Expect: 100-continue``
     header with a "417 Expectation failed" error.  Such requests will
     not be forwarded to an application object.

  2. Respond to requests containing an ``Expect: 100-continue`` request
     with an immediate "100 Continue" response, and proceed normally.

  3. Proceed with the request normally, but provide the application with
     a ``wsgi.input`` stream that will send the "100 Continue" response
     if/when the application first attempts to read from the input
     stream.  The read request must then remain blocked until the client
     responds.

Note that this behavior restriction does not apply for HTTP 1.0 requests,
or for requests that are not directed to an application object.  For more
information on HTTP 1.1 Expect/Continue, see RFC 2616, sections 8.2.3
and 10.1.1.


Questions and Answers
=====================

1. Why must ``environ`` be a dictionary?  What's wrong with using
    a subclass?

    The rationale for requiring a dictionary is to maximize
    portability between servers.  The alternative would be to define
    some subset of a dictionary's methods as being the standard and
    portable interface.  In practice, however, most servers will
    probably find a dictionary adequate to their needs, and thus
    framework authors will come to expect the full set of dictionary
    features to be available, since they will be there more often
    than not.  But, if some server chooses *not* to use a dictionary,
    then there will be interoperability problems despite that
    server's "conformance" to spec.  Therefore, making a dictionary
    mandatory simplifies the specification and guarantees
    interoperabilty.

    Note that this does not prevent server or framework developers
    from offering specialized services as custom variables *inside*
    the ``environ`` dictionary.  This is the recommended approach
    for offering any such value-added services.

2. Why can you call ``write()`` *and* yield strings/return an
    iterator?  Shouldn't we pick just one way?

    If we supported only the iteration approach, then current
    frameworks that assume the availability of "push" suffer.
    But, if we only support pushing via ``write()``, then
    server performance suffers for transmission of e.g. large
    files (if a worker thread can't begin work on a new request
    until all of the output has been sent).  Thus, this compromise
    allows an application framework to support both approaches, as
    appropriate, but with only a little more burden to the server
    implementor than a push-only approach would require.

3. What's the ``close()`` for?

    When writes are done from during the execution of an application
    object, the application can ensure that resources are released
    using a try/finally block.  But, if the application returns an
    iterator, any resources used will not be released until the
    iterator is garbage collected.  The ``close()`` idiom allows
    an application to release critical resources at the end of a
    request, and it's forward-compatible with the support for
    try/finally in generators that's proposed by PEP 325.

4. Why is this interface so low-level?  I want feature X!  (e.g.
    cookies, sessions, persistence, ...)

    This isn't Yet Another Python Web Framework.  It's just a way
    for frameworks to talk to web servers, and vice versa.  If you
    want these features, you need to pick a web framework that
    provides the features you want.  And if that framework lets
    you create a WSGI application, you should be able to run it
    in most WSGI-supporting servers.  Also, some WSGI servers may
    offer additional services via objects provided in their
    ``environ`` dictionary; see the applicable server documentation
    for details.  (Of course, applications that use such extensions
    will not be portable to other WSGI-based servers.)

5. Why use CGI variables instead of good old HTTP headers?  And
    why mix them in with WSGI-defined variables?

    Many existing web frameworks are built heavily upon the CGI spec,
    and existing web servers know how to generate CGI variables.  In
    contrast, alternative ways of representing inbound HTTP information
    are fragmented and lack market share.  Thus, using the CGI
    "standard" seems like a good way to leverage existing
    implementations.  As for mixing them with WSGI variables, separating
    them would just require two dictionary arguments to be passed
    around, while providing no real benefits.

6. What about the status string?  Can't we just use the number,
    passing in ``200`` instead of ``"200 OK"``?

    Doing this would complicate the server or gateway, by requiring
    them to have a table of numeric statuses and corresponding
    messages.  By contrast, it is easy for an application or framework
    author to type the extra text to go with the specific response code
    they are using, and existing frameworks often already have a table
    containing the needed messages.  So, on balance it seems better to
    make the application/framework responsible, rather than the server
    or gateway.


Acknowledgements
================

Thanks go to the many folks on the Web-SIG mailing list whose
thoughtful feedback made this revised draft possible.  Especially:

  * Gregory "Grisha" Trubetskoy, author of ``mod_python``, who
    beat up on the first draft as not offering any advantages
    over "plain old CGI", thus encouraging me to look for a
    better approach.

  * Ian Bicking, who helped nag me into properly specifying
    the multithreading and multiprocess options, as well as
    badgering me to provide a mechanism for servers to supply
    custom extension data to an application.

  * Tony Lownds, who came up with the concept of a ``start_response``
    function that took the status and headers, returning a ``write``
    function.


References
==========

.. [1] The Python Wiki "Web Programming" topic
    (http://www.python.org/cgi-bin/moinmoin/WebProgramming)

.. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft
    (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt)


Copyright
=========

This document has been placed in the public domain.


..
    Local Variables:
    mode: indented-text
    indent-tabs-mode: nil
    sentence-end-double-space: t
    fill-column: 70
    End:

From pje at telecommunity.com  Sat Aug 21 20:28:06 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat Aug 21 20:27:52 2004
Subject: [Web-SIG] Latest WSGI Draft
Message-ID: <5.1.1.6.0.20040821142742.027283d0@mail.telecommunity.com>

At 10:51 AM 8/21/04 -0700, tony@lownds.com wrote:
> > I think this is just about ready to submit as an official PEP, get a
> > numbering, and post to c.l.py and Python-Dev, but of course I could be
> > wrong.  Your feedback is appreciated.
>
>+1 on PEPing it
>
>This diff addresses one typo, qualifies the claim of 1.5.2 support a bit,
>and adds some language that I imagine server implementors who want to
>support Keep-alive would find useful. None of these changes are that
>important to me, if you disagree with them in the PEP I could still
>comment on them.

I think I'm going to expand on the chunked-encoding part a bit, because 
you've got some good stuff in there about when the server can and can't 
supply an omitted 'Content-Length'.    I think it should actually go in the 
section about the server supplying omitted headers, rather than buried in a 
later note.  The "application note" should just mention the option of using 
chunked encoding as an alternative to closing the connection when 
Content-Length isn't supplied by the app.

As for checking the length of the iterable, I'm okay with that, but it 
should be wrapped in a try: because it shouldn't be required that the 
iterable have a __len__ method.

Finally, regarding 1.5.2, I'm fine with dropping that claim altogether, if 
it saves us having to spell out the pre-2.2 iteration protocol 
(__len__/__getitem__/IndexError).

From ianb at colorstudy.com  Sun Aug 22 06:29:03 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun Aug 22 06:55:50 2004
Subject: [Web-SIG] Latest WSGI Draft
In-Reply-To: <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
References: <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
Message-ID: <4128210F.3050901@colorstudy.com>

Phillip J. Eby wrote:
> Once again, please pardon me if I missed an update, and gently remind me 
> with a clue by four if need be.  :)  Or better yet, by supplying a patch 
> implementing your suggested changes.  :)
> 
> I was going to post a diff, but even a unified diff is about as long as 
> the previous version was, and the new draft is almost 50% longer than 
> the old one, as lots of new material has been added about streaming, URL 
> determination, required CGI variables, etc. etc.  There's even some 
> extra material in the Rationale and Goals about using WSGI middleware to 
> better modularize
> frameworks, allowing more mix-and-match between them.
> 
> I think this is just about ready to submit as an official PEP, get a 
> numbering, and post to c.l.py and Python-Dev, but of course I could be 
> wrong.  Your feedback is appreciated.

I think it's ready as well.  I have only a couple small comments, which 
are mostly about language.  There's going to be more discussion later 
anyway, so why not get started with the second round.

> PEP: XXX

For some reason this got caught as spam.  I blame it on these triple Xs.

> Title: Python Web Server Gateway Interface v1.0
> Version: $Revision: 1.1 $
> Last-Modified: $Date: 2004/08/20 19:11:27 $
> Author: Phillip J. Eby <pje@telecommunity.com>
> Discussions-To: Python Web-SIG <web-sig@python.org>
> Status: Draft
> Type: Informational
> Content-Type: text/x-rst
> Created: 07-Dec-2003
> Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004
> 
> 
> Abstract
> ========
> 
> This document specifies a proposed standard interface between web
> servers and Python web applications or frameworks, to promote
> web application portability across a variety of web servers.
> 
> 
> Rationale and Goals
> ===================
> 
> Python currently boasts a wide variety of web application
> frameworks, such as Zope, Quixote, Webware, SkunkWeb, PSO,
> and Twisted Web -- to name just a few [1]_.  This wide variety
> of choices can be a problem for new Python users, because
> generally speaking, their choice of web framework will limit
> their choice of usable web servers, and vice versa.
> 
> By contrast, although Java has just as many web application
> frameworks available, Java's "servlet" API makes it possible
> for applications written with any Java web application framework
> to run in any web server that supports the servlet API.
> 
> The availability and widespread use of such an API in web
> servers for Python -- whether those servers are written in
> Python (e.g. Medusa), embed Python (e.g. mod_python), or
> invoke Python via a gateway protocol (e.g. CGI, FastCGI,
> etc.) -- would separate choice of framework from choice
> of web server, freeing users to choose a pairing that suits
> them, while freeing framework and server developers to focus
> on their area of specialty.
> 
> This PEP, therefore, proposes a simple and universal interface
> between web servers and web applications or frameworks: the
> Python Web Server Gateway Interface (WSGI).
> 
> But the mere existence of a WSGI spec does nothing to address the
> existing state of servers and frameworks for Python web applications.
> Server and framework authors and maintainers must actually implement
> WSGI for there to be any effect.
> 
> However, since no existing servers or frameworks support WSGI, there
> is little immediate reward for an author who implements WSGI support.
> Thus, WSGI *must* be easy to implement, so that an author's initial
> investment in the interface can be reasonably low.
> 
> Thus, simplicity of implementation on *both* the server and framework
> sides of the interface is absolutely critical to the utility of the
> WSGI interface, and is therefore the principal criterion for any
> design decisions.
> 
> Note, however, that simplicity of implementation for a framework
> author is not the same thing as ease of use for a web application
> author.  WSGI presents an absolutely "no frills" interface to the
> framework author, because bells and whistles like response objects
> and cookie handling would just get in the way of existing frameworks'
> handling of these issues.  Again, the goal of WSGI is to facilitate
> easy interconnection of existing servers and applications or
> frameworks, not to create a new web framework.
> 
> Note also that this goal precludes WSGI from requiring anything that
> is not already available in deployed versions of Python.  Therefore,
> new standard library modules are not proposed or required by this
> specification, and nothing in WSGI requires a Python version greater
> than 1.5.2.  (It would be a good idea, however, for future versions
> of Python to include support for this interface in web servers
> provided by the standard library.)

Like you said, maybe 1.5.2 is optimistic.  The spec works for 1.5.2, but 
most servers and applications will have higher requirements, and the 
iteration is annoying to handle in those versions.

> In addition to ease of implementation for existing and future
> frameworks and servers, it should also be easy to create request
> preprocessors, response postprocessors, and other WSGI-based
> "middleware" components that look like an application to their
> containing server, while acting as a server for their contained
> applications.
> 
> If middleware can be both simple and robust, and WSGI is widely
> available in servers and frameworks, it allows for the possibility
> of an entirely new kind of Python web application framework: one
> consisting of loosely-coupled WSGI middleware components.  Indeed,
> existing framework authors may even choose to refactor their
> frameworks' existing services to be provided in this way, becoming
> more like libraries used with WSGI, and less like monolithic
> frameworks.  This would then allow application developers to choose
> "best-of-breed" components for specific functionality, rather than
> having to commit to all the pros and cons of a single framework.
> 
> Of course, as of this writing, that day is doubtless quite far off.
> In the meantime, it is a sufficient short-term goal for WSGI to
> enable the use of any framework with any server.

That's a awfully pessimistic paragraph ;)

> Finally, it should be mentioned that the current version of WSGI
> does not prescribe any particular mechanism for "deploying" an
> application for use with a web server or server gateway.  At the
> present time, this is necessarily implementation-defined by the
> server or gateway.  After a sufficient number of servers and
> frameworks have implemented WSGI to provide field experience with
> varying deployment requirements, it may make sense to create
> another PEP, describing a deployment standard for WSGI servers and
> application frameworks.
> 
> 
> 
> Specification Overview
> ======================
> 
> The WSGI interface has two sides: the "server" or "gateway" side,
> and the "application" side.  The server side invokes a callable
> object that is provided by the application side.  The specifics
> of how that object is provided are up to the server or gateway.
> It is assumed that some servers or gateways will require an
> application's deployer to write a short script to create an
> instance of the server or gateway, and supply it with the
> application object.  Other servers and gateways may use
> configuration files or other mechanisms to specify where the
> application object should be imported from.

Maybe "gateway" is just distracting.

> The application object is simply a callable object that accepts
> two arguments.  The term "object" should not be misconstrued as
> requiring an actual object instance: a function, method, class,
> or instance with a ``__call__`` method are all acceptable for
> use as an application object.  Here are two example application
> objects; one is a function, and the other is a class::
> 
>     def simple_app(environ, start_response):
>         """Simplest possible application object"""
>         status = '200 OK'
>         headers = [('Content-type','text/plain')]
>         write = start_response(status, headers)
>         write('Hello world!\n')
> 
> 
>     class AppClass:
>         """Much the same thing, but as a class"""
> 
>         def __init__(self, environ, start_response):
>             self.environ = environ
>             self.start = start_response
> 
>         def __iter__(self):
>             status = '200 OK'
>             headers = [('Content-type','text/plain')]
>             self.start(status, headers)
> 
>             yield "Hello world!\n"
>             for i in range(1,11):
>                 yield "Extra line %s\n" % i

This second example confuses me.  Though as I reread it I realize more 
clearly what it's doing; __init__ is the callable (in essence), but self 
is automatically returned.  I think an instance with a __call__ method 
would be easier to understand.  OTOH, there's more concurrency overhead. 
  I dunno.  Anyway, that one confused me.

> The server or gateway invokes the application once for each request
> it receives from a web browser.  To illustrate, here is a simple
> CGI gateway, implemented as a function taking an application object
> (all error handling omitted)::
> 
>     import os, sys
> 
>     def run_with_cgi(application):
> 
>         environ = {}
>         environ.update(os.environ)
>         environ['wsgi.input']        = sys.stdin
>         environ['wsgi.errors']       = sys.stderr
>         environ['wsgi.version']      = '1.0'
>         environ['wsgi.multithread']  = False
>         environ['wsgi.multiprocess'] = True
>         environ['wsgi.last_call']    = True
> 
>         def start_response(status,headers):
>             print "Status:", status
>             for key,val in headers:
>                 print "%s: %s" % (key,val)
>             return sys.stdout.write
> 
>         result = application(environ, start_response)
>         if result:
>             try:
>                 for data in result:
>                     sys.stdout.write(data)
>             finally:
>                 if hasattr(result,'close'):
>                     result.close()
> 
> In the next section, we will specify the precise semantics that
> these illustrations are examples of.
> 
> 
> Specification Details
> =====================
> 
> The application object must accept two positional arguments.  For
> the sake of illustration, we have named them ``environ``, and
> ``start_response``, but they are not required to have these names.
> A server or gateway *must* invoke the application object using
> positional (not keyword) arguments.
> 
> The first parameter is a dictionary object, containing CGI-style
> environment variables.  

I think the spec is easier to understand if you use names here, i.e., 
"environ is a dictionary object".  Or remind the reader of the 
invocation, i.e., note application(environ, start_response) is called.

> This object *must* be a builtin Python
> dictionary (*not* a subclass, ``UserDict`` or other dictionary
> emulation), and the application is allowed to modify the dictionary
> in any way it desires.  The dictionary must also include certain
> WSGI-required variables (described in a later section), and may
> also include server-specific extension variables, named according
> to a convention that will be described below.
> 
> The second parameter is a callable accepting two positional
> arguments: a status string of the form ``"999 Message here"``,
> and a list of ``(header_name,header_value)`` tuples describing the
> HTTP response header.  This callable must return another callable
> that takes one parameter: a string to write as part of the HTTP
> response body.

"This callable must return a writing function: a function that takes a 
single string as an argument, which is written as the HTTP response body."

I guess "function" is more specific than "callable", but it seems easier 
to understand.  Though honestly, I find the CGI example the easiest way 
to understand this, so maybe being more accurate here is fine.

> The application object may return either ``None`` (indicating that
> there is no additional output), or it may return a non-empty
> iterable yielding strings.  (For example, it could be a
> generator-iterator that yields strings, or it could be a
> sequence such as a list of strings.)  The server or gateway will
> treat the strings yielded by the iterable as if they had been
> passed to the ``write()`` method.
> 
> Also, if the application returns an iterable, and the iterable has a
> ``close()`` method, the server or gateway *must* call that method
> upon completion of the current request, whether the request was
> completed normally, or terminated early due to an error.  (This is to
> support resource release by the application.  This protocol is
> intended to support PEP 325, and also the simple case of an
> application returning an open text file.)
> 
> 
> ``environ`` Variables
> ---------------------
> 
> The ``environ`` dictionary is required to contain these CGI environment
> variables, as defined by the Common Gateway Interface specification
> [2]_.  The following variables *must* be present, but *may* be an empty
> string, if there is no more appropriate value for them:
> 
>  * ``REQUEST_METHOD``
> 
>  * ``SCRIPT_NAME`` (The initial portion of the request URL's "path" that
>    corresponds to the application object, so that the application knows
>    its virtual "location".)
> 
>  * ``PATH_INFO`` (The remainder of the request URL's "path", designating
>     the virtual "location" of the request's target within the application)
> 
>  * ``QUERY_STRING``
> 
>  * ``CONTENT_TYPE``
> 
>  * ``CONTENT_LENGTH``
> 
>  * ``SERVER_NAME`` and ``SERVER_PORT`` (which, when combined with
>    ``SCRIPT_NAME`` and ``PATH_INFO``, should complete the

You forgot to finish your sentence.  Also SERVER_NAME is a fallback if 
HTTP_HOST isn't present; generally SERVER_NAME indicates the canonical 
host name, not necessarily the actual host name.

>  * Variables corresponding to the client-supplied HTTP headers (i.e.,
>    variables whose names begin with ``"HTTP_"``).
> 
> In general, a server or gateway should attempt to provide as many
> other CGI variables as are applicable, including e.g. the nonstandard
> SSL variables such as ``HTTPS=on``, if an SSL connection is in effect.
> However, an application that uses any variables other than the ones
> listed above are necessarily non-portable to web servers that do not
> support the relevant extensions.
 >
> A WSGI-compliant server or gateway *should* document what variables
> it provides, along with their definitions as appropriate.  Applications
> *should* check for the presence of any nonstandard variables they
> require, and have a fallback plan in the event such a variable is
> absent.
> 
> Note: missing variables (such as ``REMOTE_USER`` when no
> authentication has occurred) should be left out of the ``environ``
> dictionary.  Also note that CGI-defined variables must be strings,
> if they are present at all.  It is a violation of this specification
> for a CGI variable's value to be of any type other than ``str``.
> 
> In addition to the CGI-defined variables, the ``environ`` dictionary
> must also contain the following WSGI-defined variables:
> 
> =====================  ==============================================
> Variable               Value
> =====================  ==============================================
> ``wsgi.version``       The tuple ``(1,0)``, representing WSGI
>                        version 1.0.
> 
> ``wsgi.input``         An input stream from which the HTTP request
>                        body can be read.
> 
> ``wsgi.errors``        An output stream to which error output can
>                        be written.  For most servers, this will be
>                        the server's error log.
> 
> ``wsgi.multithread``   This value should be true if the application
>                        object may be simultaneously invoked by
>                        another thread in the same process, and
>                        false otherwise.
> 
> ``wsgi.multiprocess``  This value should be true if an equivalent
>                        application object may be simultaneously
>                        invoked by another process, and false
>                        otherwise.
> 
> ``wsgi.last_call``     This value should be true if this is expected
>                        to be the last invocation of the application
>                        in this process.  This is provided to allow
>                        applications to optimize their setup for
>                        long-running vs. short-running scenarios.
>                        This flag should normally only be true for
>                        CGI applications, or while a server is doing
>                        some kind of "graceful shutdown".  Note that
>                        a server or gateway is still allowed to invoke
>                        the application again; this flag is only
>                        a "suggestion" to the application that it is
>                        unlikely to be reinvoked.

wsgi.last_call seems to complicated from this.  Really, it's for CGI and 
nothing else.  Maybe just wsgi.cgi?  wsgi.run_once?  I think the 
semantics shouldn't be any more general than that.  Then we can also 
guarantee that it won't be called again.

> =====================  ==============================================
> 
> Finally, the ``environ`` dictionary may also contain server-defined
> variables.  These variables should be named using only lower-case
> letters, numbers, dots, and underscores, and should be prefixed with
> a name that is unique to the defining server or gateway.  For
> example, ``mod_python`` might define variables with names like
> ``mod_python.some_variable``.  This naming convention allows
> "middleware" components to safely filter out extensions that they
> do not understand.  (E.g. by deleting all keys from ``environ`` that
> are all-lowercase and do not begin with ``"wsgi."``.)
> 
> 
> Input and Error Streams
> ~~~~~~~~~~~~~~~~~~~~~~~
> 
> The input and error streams provided by the server must support
> the following methods:
> 
> ===================  ==========  ========
> Method               Files       Notes
> ===================  ==========  ========
> ``read(size)``       ``input``
> ``readline()``       ``input``   1
> ``readlines(hint)``  ``input``   2
> ``__iter__()``       ``input``
> ``flush()``          ``errors``  3
> ``write(str)``       ``errors``
> ``writelines(seq)``  ``errors``
> ===================  ==========  ========
> 
> The semantics of each method are as documented in the Python Library
> Reference, except for these notes as listed in the table above:
> 
> 1. The optional "size" argument to ``readline()`` is not supported,
>    as it may be complex for server authors to implement, and is not
>    often used in practice.
> 
> 2. Note that the ``hint`` argument to ``readlines()`` is optional for
>    both caller and implementer.  The application is free not to
>    supply it, and the server or gateway is free to ignore it.
> 
> 3. Since the ``errors`` stream may not be rewound, a container is
>    free to forward write operations immediately, without buffering.
>    In this case, the ``flush()`` method may be a no-op.  Portable
>    applications, however, cannot assume that output is unbuffered
>    or that ``flush()`` is a no-op.  They must call ``flush()`` if
>    they need to ensure that output has in fact been written.  (For
>    example, to minimize intermingling of data from multiple processes
>    writing to the same error log.
> 
> The methods listed in the table above *must* be supported by all
> servers conforming to this specification.  Applications conforming
> to this specification *must not* use any other methods or attributes
> of the ``input`` or ``errors`` objects.  In particular, applications
> *must not* attempt to close these streams, even if they possess
> ``close()`` methods.
> 
> 
> The ``start_response()`` Callable
> ---------------------------------
> 
> The second parameter passed to the application object is itself a
> two-argument callable, used to begin the HTTP response and return
> a ``write()`` callable.  

"The second parameters passed to the application object (start_response) 
is a callable, used like ``start_response(status, headers)``.

The status argument is a string like "404 Not Found" or "200 OK".  This 
string must be pure 7-bit ASCII, containing no control characters, and 
not terminated with a return or linefeed.

The headers argument is a sequence of ``(header_name, header_value)`` 
tuples.  Each ``header_name`` must be a valid... (and continuing on with 
your text).

Though I'm not clear what "folding" means.  I'm guessing you mean:

Header: blah
     continuing Header content

Does the HTTP spec care about folding?  Seems like a distraction to 
mention it.

> The first parameter the ``start_response()``
> callable takes is a "status" string, of the form ``"999 Message here"``,
> where ``999`` is replaced with the HTTP status code, and ``Message here``
> is replaced with the appropriate message text.  The string *must* be
> pure 7-bit ASCII, containing no control characters.  In particular,
> it must not be terminated with a carriage return or linefeed.
> 
> The second parameter accepted by the ``start_response()`` callable
> must be a sequence of ``(header_name,header_value)`` tuples.  Each
> ``header_name`` must be a valid HTTP header name, without a
> trailing colon or other punctuation.  Each ``header_value``
> *must not* include carriage returns or linefeeds: it should be a raw
> *unfolded* header value.  If the HTTP spec calls for folding of a
> particular header, the server shall be responsible for performing the
> folding.  (These requirements are to minimize the complexity of parsing
> required by servers, gateways, and intermediate response processors
> that need to inspect or modify response headers.)
> 
> In general, the server or gateway is responsible for ensuring that
> correct headers are sent to the client: if the application omits
> a needed header, the server or gateway *should* add it.  For example,
> the HTTP ``Date:`` and ``Server:`` headers would normally be supplied
> by the server or gateway.  If the application supplies a header that
> the server would ordinarily supply, or that contradicts the server's
> intended behavior (e.g. supplying a different ``Connection:`` header),
> the server or gateway *may* discard the conflicting header, provided
> that its action is recorded for the benefit of the application author.
> 
> 
> The ``write()`` Callable
> ------------------------
> 
> The return value of the ``start_response()`` callable is a one-argument
> `write()`` callable, that accepts strings to write as part of the
> HTTP response body.
> 
> Note that the purpose of the ``write()`` callable is primarily to
> support existing application frameworks that support a streaming "push"
> API.  Therefore, strings passed to ``write()`` *must* be sent to the
> client *as soon as possible*; they must *not* be buffered unless the
> buffer will be emptied in parallel with the application's continuing
> execution (e.g. by a separate I/O thread).  If the server or gateway
> does not have a separate I/O thread available, it *must* finish
> writing the supplied string before it returns from each ``write()``
> invocation.
> 
> If the application returns an iterable, each string produced by the
> iterable must be treated as though it had been passed to ``write()``,
> with the data sent in an "as soon as possible" manner.  That is,
> the iterable should not be asked for a new string until the previous
> string has been sent to the client, or is buffered for such sending
> by a parallel thread.
> 
> Notice that these rules discourage the generation of content before a
> client is ready for it, in excess of the buffer sizes provided by the
> server and operating system.  For this reason, some applications may
> wish to buffer data internally before passing any of it to ``write()``
> or yielding it from an iterator, in order to avoid waiting for the
> client to catch up with their output.  This approach may yield better
> throughput for dynamically generated pages of moderate size, since the
> application is then freed for other tasks.
> 
> In addition to improved performance, buffering all of an application's
> output has an advantage for error handling: the buffered output can
> be thrown away and replaced by an error page, rather than dumping an
> error message in the middle of some partially-completed output.  For
> this and other reasons, many existing Python frameworks already
> accumulate their output for a single write, unless the application
> explicitly requests streaming, or the expected output is larger than
> practical for buffering (e.g. multi-megabyte PDFs).  So, these
> application frameworks are already a natural fit for the WSGI
> streaming model: for most requests they will only call ``write()``
> once anyway!
> 
> 
> Implementation/Application Notes
> ================================
> 
> 
> Unicode
> -------
> 
> HTTP does not directly support Unicode, and neither does this
> interface.  All encoding/decoding must be handled by the application;
> all strings and streams passed to or from the server must be standard
> Python byte strings, not Unicode objects.  The result of using a
> Unicode object where a string object is required, is undefined.
> 
> 
> Multiple Invocations
> --------------------
> 
> Application objects must be able to be invoked more than once, since
> virtually all servers/gateways will make such requests.
> 
> 
> Error Handling
> --------------
> 
> Servers *should* trap and log exceptions raised by
> applications, and *may* continue to execute, or attempt to shut down
> gracefully.  Applications *should* avoid allowing exceptions to
> escape their execution scope, since the result of uncaught exceptions
> is server-defined.
> 
> 
> Thread Support
> --------------
> 
> Thread support, or lack thereof, is also server-dependent.
> Servers that can run multiple requests in parallel, *should* also
> provide the option of running an application in a single-threaded
> fashion, so that applications or frameworks that are not thread-safe
> may still be used with that server.
> 
> 
> URL Reconstruction
> ------------------
> 
> If an application wishes to reconstruct a request's complete URL,
> it may do so using the following algorithm, contributed by Ian
> Bicking::
> 
>     if environ.get('HTTPS') == 'on':
>         url = 'https://'
>     else:
>         url = 'http://'
> 
>     if environ.get('HTTP_HOST'):
>         url += environ['HTTP_HOST']
>     else:
>         url += environ['SERVER_NAME']
> 
>     if environ.get('HTTPS') == 'on':
>         if environ['SERVER_PORT'] != '443'
>            url += ':' + environ['SERVER_PORT']
>     else:
>         if environ['SERVER_PORT'] != '80':
>            url += ':' + environ['SERVER_PORT']
> 
>     url += environ['SCRIPT_NAME']
>     url += environ['PATH_INFO']
>     if environ.get('QUERY_STRING'):
>         url += '?' + environ['QUERY_STRING']
> 
> Note that such a reconstructed URL may not be precisely the
> same URI as requested by the client.  Server rewrite rules, for
> example, may have modified the client's originally requested URL
> to place it in a canonical form.
> 
> 
> Application Configuration
> -------------------------
> 
> This specification does not define how a server selects or
> obtains an application to invoke.  These and other configuration
> options are highly server-specific matters.  It is expected that
> server/gateway authors will document how to configure the server to
> execute a particular application object, and with what options (such
> as threading options).
> 
> Framework authors, on the other hand, should document how to create
> an application object that wraps their framework's functionality.
> The user, who has chosen both the server and the application
> framework, must connect the two together.  However, since both the
> framework and the server now have a common interface, this should
> be merely a mechanical matter, rather than a significant engineering
> effort for each new server/framework pair.
> 
> 
> Middleware
> ----------
> 
> Note that a single object may play the role of a server with respect
> to some application(s), while also acting as an application with
> respect to some server(s).  Such "middleware" components can perform
> such functions as:
> 
>   * Routing a request to different application objects based on the
>     target URL, after rewriting the ``environ`` accordingly.
> 
>   * Allowing multiple applications or frameworks to run side-by-side
>     in the same process
> 
>   * Load balancing and remote processing, by forwarding requests and
>     responses over a network
> 
>   * Perform content postprocessing, such as applying XSL stylesheets
> 
> Given the existence of applications and servers conforming to this
> specification, the appearance of such reusable middleware becomes
> a possibility.
> 
> Middleware components that transform the request or response data
> should in general remove WSGI extension data from the ``environ``
> that the middleware does not understand, to prevent applications
> from inadvertently bypassing the middleware's mediation of the
> interaction by use of a server extension.  The simplest way to do
> this is to just delete keys from ``environ`` that are all lowercase
> and do not begin with ``"wsgi."``, before passing the ``environ``
> on to the application.

I don't understand this.  To me it seems more reasonable that middleware 
leave the extra arguments in place.

For instance, lets say I have a URL redirecting middleware.  There's a 
chance I need to look at the parsed form of QUERY_STRING, and I cache 
the result as a dictionary in, say, webkit.query_vars.  That's just as 
valid later.  Oh, well, unless someone rewrites QUERY_STRING.  So to be 
safe, I put the query string I parsed in webkit.query_string.

But maybe I have some other middleware that handles configuration.  It 
runs after the URL parser, for localized configuration.  It doesn't 
necessarily know about the query string, or about the other piece of 
middleware.  And it shouldn't know about it, because what would be the 
point of that?  They are decoupled.  But I don't want it throwing away 
that information.

In that case, it's just some lost time reparsing the URL, but I can 
imagine more important things, and a lot of pieces of middleware where 
the only point is that they add something to the environ dictionary. 
E.g., a session-handling middleware.  There's not point to these if 
other middleware is going to throw information away.

If there's reliability issues -- like middleware rewriting QUERY_STRING, 
but passing through a cached parse of the old QUERY_STRING that it 
didn't know about -- these can be handled pretty easily.  But if one 
middleware throws away keys it doesn't know about, it messes up the 
whole stack.

> HTTP 1.1 Expect/Continue
> ------------------------
> 
> Servers and gateways *must* provide transparent support for HTTP 1.1's
> "expect/continue" mechanism, if they implement HTTP 1.1.  This may be
> done in any of several ways:
> 
>  1. Reject all client requests containing an ``Expect: 100-continue``
>     header with a "417 Expectation failed" error.  Such requests will
>     not be forwarded to an application object.
> 
>  2. Respond to requests containing an ``Expect: 100-continue`` request
>     with an immediate "100 Continue" response, and proceed normally.
> 
>  3. Proceed with the request normally, but provide the application with
>     a ``wsgi.input`` stream that will send the "100 Continue" response
>     if/when the application first attempts to read from the input
>     stream.  The read request must then remain blocked until the client
>     responds.
> 
> Note that this behavior restriction does not apply for HTTP 1.0 requests,
> or for requests that are not directed to an application object.  For more
> information on HTTP 1.1 Expect/Continue, see RFC 2616, sections 8.2.3
> and 10.1.1.
> 
> 
> 
> 
> Questions and Answers
> =====================
> 
> 1. Why must ``environ`` be a dictionary?  What's wrong with using
>    a subclass?
> 
>    The rationale for requiring a dictionary is to maximize
>    portability between servers.  The alternative would be to define
>    some subset of a dictionary's methods as being the standard and
>    portable interface.  In practice, however, most servers will
>    probably find a dictionary adequate to their needs, and thus
>    framework authors will come to expect the full set of dictionary
>    features to be available, since they will be there more often
>    than not.  But, if some server chooses *not* to use a dictionary,
>    then there will be interoperability problems despite that
>    server's "conformance" to spec.  Therefore, making a dictionary
>    mandatory simplifies the specification and guarantees
>    interoperabilty.
> 
>    Note that this does not prevent server or framework developers
>    from offering specialized services as custom variables *inside*
>    the ``environ`` dictionary.  This is the recommended approach
>    for offering any such value-added services.
> 
> 2. Why can you call ``write()`` *and* yield strings/return an
>    iterator?  Shouldn't we pick just one way?
> 
>    If we supported only the iteration approach, then current
>    frameworks that assume the availability of "push" suffer.
>    But, if we only support pushing via ``write()``, then
>    server performance suffers for transmission of e.g. large
>    files (if a worker thread can't begin work on a new request
>    until all of the output has been sent).  Thus, this compromise
>    allows an application framework to support both approaches, as
>    appropriate, but with only a little more burden to the server
>    implementor than a push-only approach would require.
> 
> 3. What's the ``close()`` for?
> 
>    When writes are done from during the execution of an application
>    object, the application can ensure that resources are released
>    using a try/finally block.  But, if the application returns an
>    iterator, any resources used will not be released until the
>    iterator is garbage collected.  The ``close()`` idiom allows
>    an application to release critical resources at the end of a
>    request, and it's forward-compatible with the support for
>    try/finally in generators that's proposed by PEP 325.
> 
> 4. Why is this interface so low-level?  I want feature X!  (e.g.
>    cookies, sessions, persistence, ...)
> 
>    This isn't Yet Another Python Web Framework.  It's just a way
>    for frameworks to talk to web servers, and vice versa.  If you
>    want these features, you need to pick a web framework that
>    provides the features you want.  And if that framework lets
>    you create a WSGI application, you should be able to run it
>    in most WSGI-supporting servers.  Also, some WSGI servers may
>    offer additional services via objects provided in their
>    ``environ`` dictionary; see the applicable server documentation
>    for details.  (Of course, applications that use such extensions
>    will not be portable to other WSGI-based servers.)
> 
> 5. Why use CGI variables instead of good old HTTP headers?  And
>    why mix them in with WSGI-defined variables?
> 
>    Many existing web frameworks are built heavily upon the CGI spec,
>    and existing web servers know how to generate CGI variables.  In
>    contrast, alternative ways of representing inbound HTTP information
>    are fragmented and lack market share.  Thus, using the CGI
>    "standard" seems like a good way to leverage existing
>    implementations.  As for mixing them with WSGI variables, separating
>    them would just require two dictionary arguments to be passed
>    around, while providing no real benefits.
> 
> 6. What about the status string?  Can't we just use the number,
>    passing in ``200`` instead of ``"200 OK"``?
> 
>    Doing this would complicate the server or gateway, by requiring
>    them to have a table of numeric statuses and corresponding
>    messages.  By contrast, it is easy for an application or framework
>    author to type the extra text to go with the specific response code
>    they are using, and existing frameworks often already have a table
>    containing the needed messages.  So, on balance it seems better to
>    make the application/framework responsible, rather than the server
>    or gateway.
> 
> 
> Acknowledgements
> ================
> 
> Thanks go to the many folks on the Web-SIG mailing list whose
> thoughtful feedback made this revised draft possible.  Especially:
> 
>  * Gregory "Grisha" Trubetskoy, author of ``mod_python``, who
>    beat up on the first draft as not offering any advantages
>    over "plain old CGI", thus encouraging me to look for a
>    better approach.
> 
>  * Ian Bicking, who helped nag me into properly specifying
>    the multithreading and multiprocess options, as well as
>    badgering me to provide a mechanism for servers to supply
>    custom extension data to an application.
> 
>  * Tony Lownds, who came up with the concept of a ``start_response``
>    function that took the status and headers, returning a ``write``
>    function.
> 
> 
> References
> ==========
> 
> .. [1] The Python Wiki "Web Programming" topic
>    (http://www.python.org/cgi-bin/moinmoin/WebProgramming)
> 
> .. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft
>    (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt)
> 
> 
> Copyright
> =========
> 
> This document has been placed in the public domain.
> 
> 
> 
> ..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    End:
> 
From pje at telecommunity.com  Sun Aug 22 19:12:02 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun Aug 22 19:11:54 2004
Subject: [Web-SIG] Latest WSGI Draft
In-Reply-To: <4128210F.3050901@colorstudy.com>
References: <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>

At 11:29 PM 8/21/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>Note also that this goal precludes WSGI from requiring anything that
>>is not already available in deployed versions of Python.  Therefore,
>>new standard library modules are not proposed or required by this
>>specification, and nothing in WSGI requires a Python version greater
>>than 1.5.2.  (It would be a good idea, however, for future versions
>>of Python to include support for this interface in web servers
>>provided by the standard library.)
>
>Like you said, maybe 1.5.2 is optimistic.  The spec works for 1.5.2, but 
>most servers and applications will have higher requirements, and the 
>iteration is annoying to handle in those versions.

Fine, we'll say 2.2.2, since that version had True and False as well as 
__iter__.


>>If middleware can be both simple and robust, and WSGI is widely
>>available in servers and frameworks, it allows for the possibility
>>of an entirely new kind of Python web application framework: one
>>consisting of loosely-coupled WSGI middleware components.  Indeed,
>>existing framework authors may even choose to refactor their
>>frameworks' existing services to be provided in this way, becoming
>>more like libraries used with WSGI, and less like monolithic
>>frameworks.  This would then allow application developers to choose
>>"best-of-breed" components for specific functionality, rather than
>>having to commit to all the pros and cons of a single framework.
>>Of course, as of this writing, that day is doubtless quite far off.
>>In the meantime, it is a sufficient short-term goal for WSGI to
>>enable the use of any framework with any server.
>
>That's a awfully pessimistic paragraph ;)

Are you being ironic?  I'm not sure I follow you here.


>>The WSGI interface has two sides: the "server" or "gateway" side,
>>and the "application" side.  The server side invokes a callable
>>object that is provided by the application side.  The specifics
>>of how that object is provided are up to the server or gateway.
>>It is assumed that some servers or gateways will require an
>>application's deployer to write a short script to create an
>>instance of the server or gateway, and supply it with the
>>application object.  Other servers and gateways may use
>>configuration files or other mechanisms to specify where the
>>application object should be imported from.
>
>Maybe "gateway" is just distracting.

Do you have a specific suggestion here?


>>     class AppClass:
>>         """Much the same thing, but as a class"""
>>         def __init__(self, environ, start_response):
>>             self.environ = environ
>>             self.start = start_response
>>         def __iter__(self):
>>             status = '200 OK'
>>             headers = [('Content-type','text/plain')]
>>             self.start(status, headers)
>>             yield "Hello world!\n"
>>             for i in range(1,11):
>>                 yield "Extra line %s\n" % i
>
>This second example confuses me.  Though as I reread it I realize more 
>clearly what it's doing; __init__ is the callable (in essence), but self 
>is automatically returned.  I think an instance with a __call__ method 
>would be easier to understand.  OTOH, there's more concurrency 
>overhead.  I dunno.  Anyway, that one confused me.

Perhaps you could suggest some text to add to the docstring that would have 
prevented your initial confusion?


>>The application object must accept two positional arguments.  For
>>the sake of illustration, we have named them ``environ``, and
>>``start_response``, but they are not required to have these names.
>>A server or gateway *must* invoke the application object using
>>positional (not keyword) arguments.
>>The first parameter is a dictionary object, containing CGI-style
>>environment variables.
>
>I think the spec is easier to understand if you use names here, i.e., 
>"environ is a dictionary object".  Or remind the reader of the invocation, 
>i.e., note application(environ, start_response) is called.

I'll try to do something with this.


>>The second parameter is a callable accepting two positional
>>arguments: a status string of the form ``"999 Message here"``,
>>and a list of ``(header_name,header_value)`` tuples describing the
>>HTTP response header.  This callable must return another callable
>>that takes one parameter: a string to write as part of the HTTP
>>response body.
>
>"This callable must return a writing function: a function that takes a 
>single string as an argument, which is written as the HTTP response body."

I'll work on this one too.


>I guess "function" is more specific than "callable", but it seems easier 
>to understand.  Though honestly, I find the CGI example the easiest way to 
>understand this, so maybe being more accurate here is fine.

I've got to explain *somewhere* that these are any callable.  Maybe I 
should preface the overview with an explanation of what "a callable" means, 
and reinforce it once or twice in the form "such and such is a callable 
(function, method, class, callable instance, etc.) that blah blah blah".


>>  * ``SERVER_NAME`` and ``SERVER_PORT`` (which, when combined with
>>    ``SCRIPT_NAME`` and ``PATH_INFO``, should complete the
>
>You forgot to finish your sentence.  Also SERVER_NAME is a fallback if 
>HTTP_HOST isn't present; generally SERVER_NAME indicates the canonical 
>host name, not necessarily the actual host name.

Ah yes.  Tony already provided a patch for the typo, but I'll add something 
about HTTP_HOST.


>>``wsgi.last_call``     This value should be true if this is expected
>>                        to be the last invocation of the application
>>                        in this process.  This is provided to allow
>>                        applications to optimize their setup for
>>                        long-running vs. short-running scenarios.
>>                        This flag should normally only be true for
>>                        CGI applications, or while a server is doing
>>                        some kind of "graceful shutdown".  Note that
>>                        a server or gateway is still allowed to invoke
>>                        the application again; this flag is only
>>                        a "suggestion" to the application that it is
>>                        unlikely to be reinvoked.
>
>wsgi.last_call seems to complicated from this.

It's precisely what you agreed to as a solution for your issue.  Granted, I 
was also surprised by how long the "official" explanation of the feature 
turned out to be.


>   Really, it's for CGI and nothing else.  Maybe just 
> wsgi.cgi?  wsgi.run_once?  I think the semantics shouldn't be any more 
> general than that.  Then we can also guarantee that it won't be called again.

I'm really reluctant to require the server to make such a guarantee.  My 
understanding of your use case is really more like, "I'm not likely to run 
you again for a while, so don't optimize for frequent execution."

Hm.  Now that I'm thinking about it more, it seems to me that this could be 
just as easily handled by application/framework-side configuration, and I'm 
inclined to remove it from the spec altogether.


>>The ``start_response()`` Callable
>>---------------------------------
>>The second parameter passed to the application object is itself a
>>two-argument callable, used to begin the HTTP response and return
>>a ``write()`` callable.
>
>"The second parameters passed to the application object (start_response) 
>is a callable, used like ``start_response(status, headers)``.

I'll work on this.


>The status argument is a string like "404 Not Found" or "200 OK".  This 
>string must be pure 7-bit ASCII, containing no control characters, and not 
>terminated with a return or linefeed.
>
>The headers argument is a sequence of ``(header_name, header_value)`` 
>tuples.  Each ``header_name`` must be a valid... (and continuing on with 
>your text).

I'll work on this.


>Though I'm not clear what "folding" means.  I'm guessing you mean:
>
>Header: blah
>     continuing Header content

Yes.

>Does the HTTP spec care about folding?  Seems like a distraction to 
>mention it.

I'll check.


>>Middleware components that transform the request or response data
>>should in general remove WSGI extension data from the ``environ``
>>that the middleware does not understand, to prevent applications
>>from inadvertently bypassing the middleware's mediation of the
>>interaction by use of a server extension.  The simplest way to do
>>this is to just delete keys from ``environ`` that are all lowercase
>>and do not begin with ``"wsgi."``, before passing the ``environ``
>>on to the application.
>
>I don't understand this.  To me it seems more reasonable that middleware 
>leave the extra arguments in place.
>
>For instance, lets say I have a URL redirecting middleware.  There's a 
>chance I need to look at the parsed form of QUERY_STRING, and I cache the 
>result as a dictionary in, say, webkit.query_vars.  That's just as valid 
>later.  Oh, well, unless someone rewrites QUERY_STRING.  So to be safe, I 
>put the query string I parsed in webkit.query_string.
>
>But maybe I have some other middleware that handles configuration.  It 
>runs after the URL parser, for localized configuration.  It doesn't 
>necessarily know about the query string, or about the other piece of 
>middleware.  And it shouldn't know about it, because what would be the 
>point of that?  They are decoupled.  But I don't want it throwing away 
>that information.
>
>In that case, it's just some lost time reparsing the URL, but I can 
>imagine more important things, and a lot of pieces of middleware where the 
>only point is that they add something to the environ dictionary. E.g., a 
>session-handling middleware.  There's not point to these if other 
>middleware is going to throw information away.
>
>If there's reliability issues -- like middleware rewriting QUERY_STRING, 
>but passing through a cached parse of the old QUERY_STRING that it didn't 
>know about -- these can be handled pretty easily.  But if one middleware 
>throws away keys it doesn't know about, it messes up the whole stack.

You're right.  The extension mechanism needs to be clearer.  Instead of 
throwing away everything, there needs to be a way to identify that a 
server-supplied value may be used in place of some WSGI functionality, so 
that middleware can remove only those items, rather than every item.

Hmmm.  Maybe we should have a 'wsgi.extensions' key that contains a 
dictionary for items that middleware *must* either understand, or not pass 
through.  If a framework or middleware author did your hypothetical query 
string parsing, he would have to place it in 'wsgi.extensions' if he did 
not implement the cross-check you describe.

Sigh.  This will probably need to be a new section on "WSGI Extensions and 
Middleware".

From pje at telecommunity.com  Sun Aug 22 20:16:29 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun Aug 22 20:16:22 2004
Subject: [Web-SIG] HTTP header canonicalization?
Message-ID: <5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com>

While reviewing the HTTP/1.1 spec (RFC 2616) for information on header 
folding, I noticed an interesting bit under section "4.2 Message Headers":

    Multiple message-header fields with the same field-name MAY be
    present in a message if and only if the entire field-value for that
    header field is defined as a comma-separated list [i.e., #(values)].
    It MUST be possible to combine the multiple header fields into one
    "field-name: field-value" pair, without changing the semantics of the
    message, by appending each subsequent field-value to the first, each
    separated by a comma. The order in which header fields with the same
    field-name are received is therefore significant to the
    interpretation of the combined field value, and thus a proxy MUST NOT
    change the order of these field values when a message is forwarded.

So, although I've defined the headers sent by the application as a list of 
name/value pairs, it seems that we *could* use a dictionary instead, if we 
required that multiple headers not be used, and that some canonical form 
(e.g. all lower-case) be used for the names.

Does anybody see any issues with this?  The upside is that it makes it easy 
for servers/gateways to add missing headers (using 
'headerdict.setdefault()'), and it should also be easier for 
application/framework developers to build up their headers incrementally in 
the same way.

The only downsides I see that could possibly come up are:

  * There's some reason to have headers with different names in a specific 
order, even though the spec is adamant that such an ordering is 
insignificant and not to be relied upon.

  * There's some reason to split multi-value headers into separate header 
lines, even though the spec is adamant that the forms are equivalent, and 
that HTTP has no limitations on line length.

Does anybody know whether any HTTP clients in practice are affected by 
these matters?

From ianb at colorstudy.com  Sun Aug 22 21:18:52 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun Aug 22 21:18:57 2004
Subject: [Web-SIG] Latest WSGI Draft
In-Reply-To: <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
References: <5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
Message-ID: <4128F19C.1060500@colorstudy.com>

Phillip J. Eby wrote:
>> That's a awfully pessimistic paragraph ;)
> 
> 
> Are you being ironic?  I'm not sure I follow you here.

I don't know if I was being ironic.  But it was just an offhanded 
comment, not a suggestion to change anything.

>>> The WSGI interface has two sides: the "server" or "gateway" side,
>>> and the "application" side.  The server side invokes a callable
>>> object that is provided by the application side.  The specifics
>>> of how that object is provided are up to the server or gateway.
>>> It is assumed that some servers or gateways will require an
>>> application's deployer to write a short script to create an
>>> instance of the server or gateway, and supply it with the
>>> application object.  Other servers and gateways may use
>>> configuration files or other mechanisms to specify where the
>>> application object should be imported from.
>>
>>
>> Maybe "gateway" is just distracting.
> 
> 
> Do you have a specific suggestion here?

Use only the term "server".

>>>     class AppClass:
>>>         """Much the same thing, but as a class"""
>>>         def __init__(self, environ, start_response):
>>>             self.environ = environ
>>>             self.start = start_response
>>>         def __iter__(self):
>>>             status = '200 OK'
>>>             headers = [('Content-type','text/plain')]
>>>             self.start(status, headers)
>>>             yield "Hello world!\n"
>>>             for i in range(1,11):
>>>                 yield "Extra line %s\n" % i
>>
>>
>> This second example confuses me.  Though as I reread it I realize more 
>> clearly what it's doing; __init__ is the callable (in essence), but 
>> self is automatically returned.  I think an instance with a __call__ 
>> method would be easier to understand.  OTOH, there's more concurrency 
>> overhead.  I dunno.  Anyway, that one confused me.
> 
> 
> Perhaps you could suggest some text to add to the docstring that would 
> have prevented your initial confusion?

I think it makes sense when you see it in action, i.e.,:

AppClass *is* the application object (*not* instances of AppClass). 
AppClass(environ, start_response) starts the response; it returns an 
instance of itself, which is an iterator that produces the content.

I see what really confused me.  Shouldn't that be more like:

class AppClass:
     def __init__(self, environ, start_response):
         self.environ = environ
         status = '200 OK'
         headers = [('Content-type', 'text/plain')]
         start_response(status, headers)
         # return self is implicit
     def __iter__(self):
         yield "Hello world!\n"
         for i in range(1, 11):
             yield "Extra line %s\n" % i

running start_response in __iter__ seems strange to me.  Maybe it's 
correct, but I expect the call sequence to be:

application(environ, start_response)
   start_response(status_code, environ) returns write()
   possible write() calls
application returns iterable
server uses iterable

In this example, the write() function only is created after you start 
the iteration.  Maybe that's fine, I'm not sure -- it's a little odd, 
because when you start the iteration you expect to be getting the body, 
but the headers haven't been sent yet.  Of course, you ensure the 
headers get sent, but it definitely confuses me.

>> I guess "function" is more specific than "callable", but it seems 
>> easier to understand.  Though honestly, I find the CGI example the 
>> easiest way to understand this, so maybe being more accurate here is 
>> fine.
> 
> 
> I've got to explain *somewhere* that these are any callable.  Maybe I 
> should preface the overview with an explanation of what "a callable" 
> means, and reinforce it once or twice in the form "such and such is a 
> callable (function, method, class, callable instance, etc.) that blah 
> blah blah".

Sure.  But it might not be that big a deal -- I think just using names 
more often might help.  "The write callable", for instance, instead of 
"a callable".

>>> ``wsgi.last_call``     This value should be true if this is expected
>>>                        to be the last invocation of the application
>>>                        in this process.  This is provided to allow
>>>                        applications to optimize their setup for
>>>                        long-running vs. short-running scenarios.
>>>                        This flag should normally only be true for
>>>                        CGI applications, or while a server is doing
>>>                        some kind of "graceful shutdown".  Note that
>>>                        a server or gateway is still allowed to invoke
>>>                        the application again; this flag is only
>>>                        a "suggestion" to the application that it is
>>>                        unlikely to be reinvoked.
>>
>>
>> wsgi.last_call seems to complicated from this.
> 
> 
> It's precisely what you agreed to as a solution for your issue.  
> Granted, I was also surprised by how long the "official" explanation of 
> the feature turned out to be.

Yes, it's what I agreed to.  But looking at the length of the 
description, I think I was wrong, it's shouldn't be that complicated to 
explain.

>>   Really, it's for CGI and nothing else.  Maybe just wsgi.cgi?  
>> wsgi.run_once?  I think the semantics shouldn't be any more general 
>> than that.  Then we can also guarantee that it won't be called again.
> 
> 
> I'm really reluctant to require the server to make such a guarantee.  My 
> understanding of your use case is really more like, "I'm not likely to 
> run you again for a while, so don't optimize for frequent execution."
> 
> Hm.  Now that I'm thinking about it more, it seems to me that this could 
> be just as easily handled by application/framework-side configuration, 
> and I'm inclined to remove it from the spec altogether.

That was initially how multithreaded and multiprocess was going to be 
handled too, but I think it's really important that those will be 
specified.  CGI is the only realistic use case for this feature, but 
it's a really common use case (since it's really just a widely supported 
standard that we are building on), and it presents a distinct set of 
problems for Python.  I don't see any reason not to just be explicit 
about being in a CGI environment -- every server will clearly know if 
it's in a CGI environment, every application can ignore it if it 
chooses, everyone will know exactly what it means in the spec.

>>> Middleware components that transform the request or response data
>>> should in general remove WSGI extension data from the ``environ``
>>> that the middleware does not understand, to prevent applications
>>> from inadvertently bypassing the middleware's mediation of the
>>> interaction by use of a server extension.  The simplest way to do
>>> this is to just delete keys from ``environ`` that are all lowercase
>>> and do not begin with ``"wsgi."``, before passing the ``environ``
>>> on to the application.
>>
>>
>> I don't understand this.  To me it seems more reasonable that 
>> middleware leave the extra arguments in place.
>>
>> For instance, lets say I have a URL redirecting middleware.  There's a 
>> chance I need to look at the parsed form of QUERY_STRING, and I cache 
>> the result as a dictionary in, say, webkit.query_vars.  That's just as 
>> valid later.  Oh, well, unless someone rewrites QUERY_STRING.  So to 
>> be safe, I put the query string I parsed in webkit.query_string.
>>
>> But maybe I have some other middleware that handles configuration.  It 
>> runs after the URL parser, for localized configuration.  It doesn't 
>> necessarily know about the query string, or about the other piece of 
>> middleware.  And it shouldn't know about it, because what would be the 
>> point of that?  They are decoupled.  But I don't want it throwing away 
>> that information.
>>
>> In that case, it's just some lost time reparsing the URL, but I can 
>> imagine more important things, and a lot of pieces of middleware where 
>> the only point is that they add something to the environ dictionary. 
>> E.g., a session-handling middleware.  There's not point to these if 
>> other middleware is going to throw information away.
>>
>> If there's reliability issues -- like middleware rewriting 
>> QUERY_STRING, but passing through a cached parse of the old 
>> QUERY_STRING that it didn't know about -- these can be handled pretty 
>> easily.  But if one middleware throws away keys it doesn't know about, 
>> it messes up the whole stack.
> 
> 
> You're right.  The extension mechanism needs to be clearer.  Instead of 
> throwing away everything, there needs to be a way to identify that a 
> server-supplied value may be used in place of some WSGI functionality, 
> so that middleware can remove only those items, rather than every item.
> 
> Hmmm.  Maybe we should have a 'wsgi.extensions' key that contains a 
> dictionary for items that middleware *must* either understand, or not 
> pass through.  If a framework or middleware author did your hypothetical 
> query string parsing, he would have to place it in 'wsgi.extensions' if 
> he did not implement the cross-check you describe.

I'm quite comfortable with solving this in on ad hoc basis.  Generally 
the issue is middleware that rewrites the environment, but some 
extension depends on a value in the environment and isn't simultaneously 
updated.  In general, keeping a note about what the value of the key was 
will work fine, in those small number of cases where it is an issue. 
Then it's up to the extension-using application (and middleware) to 
agree on a reliable way to do things, and other pieces of middleware 
don't need to worry about any of it.

I guess the problem is that someone might build in a dependency, but not 
be careful about it, and bugs would only arise in the presence of some 
middleware which the author didn't test with.  It's the same issue if 
the author doesn't set wsgi.extensions properly, though that's more 
explicit and maybe harder to miss.

> Sigh.  This will probably need to be a new section on "WSGI Extensions 
> and Middleware".


-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From mnot at mnot.net  Mon Aug 23 00:10:40 2004
From: mnot at mnot.net (Mark Nottingham)
Date: Mon Aug 23 00:10:44 2004
Subject: [Web-SIG] HTTP header canonicalization?
In-Reply-To: <5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com>
References: <5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com>
Message-ID: <1A41660E-F488-11D8-82BE-000A95BD86C0@mnot.net>

The only problem I'm aware of is Set-Cookie, which can have an unquoted 
expires date in it; e.g.,

   Set-Cookie: CUSTOMER=WILE_E_COYOTE; path=/; expires=Wednesday, 
09-Nov-99 23:12:40 GMT

If you have two of these, the comma after the day (here, "Wednesday") 
makes parsing problematic.

Note that this is only specified in the original netscape cookie spec 
[1], not the State Management RFC [2]. See section 10.1.2 of [2] for 
more discussion of this issue.

So, you *shouldn't* see these, especially since WSGI is about the 
server side. All the same, I'll ask around to see how often they're 
still seen in the wild.

It would also be interesting to hear from people working on WSGI 
application frameworks to find out how many expect to set multiple 
cookies with expires (as opposed to max-age) in at least one; it might 
be best to simply disallow doing so, or to require quoting.

Regarding ordering of headers with different names; I don't think so. 
Note that HTTP says

"""it is "good practice" to send general-header fields first, followed 
by request-header or response-header fields, and ending with the 
entity-header fields."""

This isn't very strict, though.

WRT header length limitations, most people start get nervous when they 
get larger than 2048 characters; some proxies (esp. older ones) did 
limit there, or even at 1024 characters.

Note that headers can be split into multiple lines as well as multiple 
instances; e.g.,

Example: foo, bar

is equivalent to

Example: foo
Example: bar

and

Example: foo,
      bar

Overall, I think that modelling headers as dictionary in the 
application and passing them in that form to a server is a good thing, 
as long as the Set-Cookie issue is kept in mind. Servers might have to 
modify their serialisation on the wire to account for line lengths and 
aesthetics (generally, the only time you run into line length problems 
is when you're extending HTTP to do non-browsing things), but that 
doesn't need to be exposed to the application.

Cheers,


1. http://wp.netscape.com/newsref/std/cookie_spec.html
2. http://rfc2109.x42.com/


On Aug 22, 2004, at 11:16 AM, Phillip J. Eby wrote:

> Does anybody see any issues with this?  The upside is that it makes it 
> easy for servers/gateways to add missing headers (using 
> 'headerdict.setdefault()'), and it should also be easier for 
> application/framework developers to build up their headers 
> incrementally in the same way.
>
> The only downsides I see that could possibly come up are:
>
>  * There's some reason to have headers with different names in a 
> specific order, even though the spec is adamant that such an ordering 
> is insignificant and not to be relied upon.
>
>  * There's some reason to split multi-value headers into separate 
> header lines, even though the spec is adamant that the forms are 
> equivalent, and that HTTP has no limitations on line length.
>
> Does anybody know whether any HTTP clients in practice are affected by 
> these matters?


--
Mark Nottingham     http://www.mnot.net/

From pje at telecommunity.com  Mon Aug 23 00:14:43 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug 23 00:14:28 2004
Subject: [Web-SIG] Latest WSGI Draft
In-Reply-To: <4128F19C.1060500@colorstudy.com>
References: <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>

At 02:18 PM 8/22/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>
>>Do you have a specific suggestion here?
>
>Use only the term "server".

I'm rather reluctant to do that, because CGI, FastCGI, and many other such 
systems are "gateways" rather than servers per se.  Technically, I would 
only consider a web server that's written in Python, or embeds Python, to 
be capable of being a "server" per the spec.  Other servers must be 
accessed via a "gateway" written in Python.  Certainly it doesn't make 
sense to talk about a CGI "server", for example.


>running start_response in __iter__ seems strange to me.  Maybe it's 
>correct, but I expect the call sequence to be:
>
>application(environ, start_response)
>   start_response(status_code, environ) returns write()
>   possible write() calls
>application returns iterable
>server uses iterable
>
>In this example, the write() function only is created after you start the 
>iteration.  Maybe that's fine, I'm not sure -- it's a little odd, because 
>when you start the iteration you expect to be getting the body, but the 
>headers haven't been sent yet.  Of course, you ensure the headers get 
>sent, but it definitely confuses me.

Darn.  I guess now I'll have to explain this part, too.  :)  The intent of 
the spec is to allow start_response() to be called during the first 
iteration of the iterator.  That is, you must have called start_response() 
at least by the time the first body part is yielded from the iterator.

I illustrated this in the example, but forgot to mention it in the 
text.  I'm correcting this now.


>>>   Really, it's for CGI and nothing else.  Maybe just wsgi.cgi?
>>>wsgi.run_once?  I think the semantics shouldn't be any more general than 
>>>that.  Then we can also guarantee that it won't be called again.
>>
>>I'm really reluctant to require the server to make such a guarantee.  My 
>>understanding of your use case is really more like, "I'm not likely to 
>>run you again for a while, so don't optimize for frequent execution."
>>Hm.  Now that I'm thinking about it more, it seems to me that this could 
>>be just as easily handled by application/framework-side configuration, 
>>and I'm inclined to remove it from the spec altogether.
>
>That was initially how multithreaded and multiprocess was going to be 
>handled too, but I think it's really important that those will be 
>specified.  CGI is the only realistic use case for this feature, but it's 
>a really common use case (since it's really just a widely supported 
>standard that we are building on), and it presents a distinct set of 
>problems for Python.  I don't see any reason not to just be explicit about 
>being in a CGI environment -- every server will clearly know if it's in a 
>CGI environment, every application can ignore it if it chooses, everyone 
>will know exactly what it means in the spec.

Alright.  Let's make it 'wsgi.run_once'.  Here's my attempt at a shorter 
explanation:

``wsgi.run_once``      This value should be true if the server/gateway
                        expects (but does not guarantee!) that the
                        application will only be invoked this one time
                        during the life of its containing process.
                        Normally, this will only be true for a gateway
                        based on CGI (or something similar).


>>You're right.  The extension mechanism needs to be clearer.  Instead of 
>>throwing away everything, there needs to be a way to identify that a 
>>server-supplied value may be used in place of some WSGI functionality, so 
>>that middleware can remove only those items, rather than every item.
>>Hmmm.  Maybe we should have a 'wsgi.extensions' key that contains a 
>>dictionary for items that middleware *must* either understand, or not 
>>pass through.  If a framework or middleware author did your hypothetical 
>>query string parsing, he would have to place it in 'wsgi.extensions' if 
>>he did not implement the cross-check you describe.
>
>I'm quite comfortable with solving this in on ad hoc basis.  Generally the 
>issue is middleware that rewrites the environment, but some extension 
>depends on a value in the environment and isn't simultaneously 
>updated.  In general, keeping a note about what the value of the key was 
>will work fine, in those small number of cases where it is an issue. Then 
>it's up to the extension-using application (and middleware) to agree on a 
>reliable way to do things, and other pieces of middleware don't need to 
>worry about any of it.
>
>I guess the problem is that someone might build in a dependency, but not 
>be careful about it, and bugs would only arise in the presence of some 
>middleware which the author didn't test with.  It's the same issue if the 
>author doesn't set wsgi.extensions properly, though that's more explicit 
>and maybe harder to miss.

Here's the use case I'm thinking of.  Suppose mod_python wants to expose 
some nifty super-duper API that an application can use in place of pure 
WSGI, if it's present.  But, this interface maybe bypasses certain features 
that a particular piece of middleware is intended to intercept.  So, my 
idea here is that if mod_python puts that API into a key in 
'wsgi.extensions', then any middleware will know it's safely "intercepting 
communications" if it discards any 'wsgi.extensions'.

This is different from the sort of scenario you're talking about, where you 
can have cached data include a record of its dependencies to ensure 
correctness.

So here's the idea:

  * If you provide an alternative mechanism or extension to a WSGI-supplied 
facility, you place it in the 'wsgi.extensions' dictionary

  * If you're middleware that simply adds additional data to the 'environ', 
do so, recording your dependencies if any, to avoid becoming "stale" if 
other middleware changes things

  * If you're middleware that makes changes to existing variables, or 
intercepts any WSGI operations, do 'environ["wsgi.extensions"].clear()' or 
delete any extensions you can't intercept, to prevent the underlying 
application from "going around" you.

Your thoughts?

From pje at telecommunity.com  Mon Aug 23 00:30:12 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug 23 00:29:57 2004
Subject: [Web-SIG] HTTP header canonicalization?
In-Reply-To: <1A41660E-F488-11D8-82BE-000A95BD86C0@mnot.net>
References: <5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com>
	<5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040822181654.02342a40@mail.telecommunity.com>

At 03:10 PM 8/22/04 -0700, Mark Nottingham wrote:
>The only problem I'm aware of is Set-Cookie, which can have an unquoted 
>expires date in it; e.g.,
>
>   Set-Cookie: CUSTOMER=WILE_E_COYOTE; path=/; expires=Wednesday, 
> 09-Nov-99 23:12:40 GMT
>
>If you have two of these, the comma after the day (here, "Wednesday") 
>makes parsing problematic.
>
>Note that this is only specified in the original netscape cookie spec [1], 
>not the State Management RFC [2]. See section 10.1.2 of [2] for more 
>discussion of this issue.
>
>So, you *shouldn't* see these, especially since WSGI is about the server 
>side. All the same, I'll ask around to see how often they're still seen in 
>the wild.

Unfortunately, this seems like something that's awfully likely to be 
present in Python frameworks "in the wild".


>Regarding ordering of headers with different names; I don't think so. Note 
>that HTTP says
>
>"""it is "good practice" to send general-header fields first, followed by 
>request-header or response-header fields, and ending with the 
>entity-header fields."""
>
>This isn't very strict, though.

I was thinking that servers that want to follow "good practice" could just 
have a list of headers in the desirable order, pulling them out of the 
dictionary first.  In practice, *not* doing this simply means that every 
application or framework has to know what order headers "belong" in, so 
this doesn't seem like a terrible thing.


>Overall, I think that modelling headers as dictionary in the application 
>and passing them in that form to a server is a good thing, as long as the 
>Set-Cookie issue is kept in mind. Servers might have to modify their 
>serialisation on the wire to account for line lengths and aesthetics 
>(generally, the only time you run into line length problems is when you're 
>extending HTTP to do non-browsing things), but that doesn't need to be 
>exposed to the application.

Maybe a dictionary of lists would work?  That is, the ``headers`` field 
would look like:

     {'content-type': ['text/plain'], 'content-length': ['1234'], ...}

This would be perhaps annoying for specifying simpler fields, but it would 
still be easy to write utility functions to manipulate headers.

For the content, I'm thinking we should still prohibit embedded control 
characters, but note that the server is allowed to "fold" long header lines 
if it wishes (by replacing one or more whitespace characters with '\r\n ').

From ianb at colorstudy.com  Mon Aug 23 00:41:31 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon Aug 23 00:41:36 2004
Subject: [Web-SIG] Latest WSGI Draft
In-Reply-To: <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
References: <5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
Message-ID: <4129211B.2020101@colorstudy.com>

Phillip J. Eby wrote:
> At 02:18 PM 8/22/04 -0500, Ian Bicking wrote:
> 
>> Phillip J. Eby wrote:
>>
>>>
>>> Do you have a specific suggestion here?
>>
>>
>> Use only the term "server".
> 
> 
> I'm rather reluctant to do that, because CGI, FastCGI, and many other 
> such systems are "gateways" rather than servers per se.  Technically, I 
> would only consider a web server that's written in Python, or embeds 
> Python, to be capable of being a "server" per the spec.  Other servers 
> must be accessed via a "gateway" written in Python.  Certainly it 
> doesn't make sense to talk about a CGI "server", for example.

Okay, that's fine then.

>>>>   Really, it's for CGI and nothing else.  Maybe just wsgi.cgi?
>>>> wsgi.run_once?  I think the semantics shouldn't be any more general 
>>>> than that.  Then we can also guarantee that it won't be called again.
>>>
>>>
>>> I'm really reluctant to require the server to make such a guarantee.  
>>> My understanding of your use case is really more like, "I'm not 
>>> likely to run you again for a while, so don't optimize for frequent 
>>> execution."
>>> Hm.  Now that I'm thinking about it more, it seems to me that this 
>>> could be just as easily handled by application/framework-side 
>>> configuration, and I'm inclined to remove it from the spec altogether.
>>
>>
>> That was initially how multithreaded and multiprocess was going to be 
>> handled too, but I think it's really important that those will be 
>> specified.  CGI is the only realistic use case for this feature, but 
>> it's a really common use case (since it's really just a widely 
>> supported standard that we are building on), and it presents a 
>> distinct set of problems for Python.  I don't see any reason not to 
>> just be explicit about being in a CGI environment -- every server will 
>> clearly know if it's in a CGI environment, every application can 
>> ignore it if it chooses, everyone will know exactly what it means in 
>> the spec.
> 
> 
> Alright.  Let's make it 'wsgi.run_once'.  Here's my attempt at a shorter 
> explanation:
> 
> ``wsgi.run_once``      This value should be true if the server/gateway
>                        expects (but does not guarantee!) that the
>                        application will only be invoked this one time
>                        during the life of its containing process.
>                        Normally, this will only be true for a gateway
>                        based on CGI (or something similar).

Is there a reason it can't be guaranteed?

>>> You're right.  The extension mechanism needs to be clearer.  Instead 
>>> of throwing away everything, there needs to be a way to identify that 
>>> a server-supplied value may be used in place of some WSGI 
>>> functionality, so that middleware can remove only those items, rather 
>>> than every item.
>>> Hmmm.  Maybe we should have a 'wsgi.extensions' key that contains a 
>>> dictionary for items that middleware *must* either understand, or not 
>>> pass through.  If a framework or middleware author did your 
>>> hypothetical query string parsing, he would have to place it in 
>>> 'wsgi.extensions' if he did not implement the cross-check you describe.
>>
>>
>> I'm quite comfortable with solving this in on ad hoc basis.  Generally 
>> the issue is middleware that rewrites the environment, but some 
>> extension depends on a value in the environment and isn't 
>> simultaneously updated.  In general, keeping a note about what the 
>> value of the key was will work fine, in those small number of cases 
>> where it is an issue. Then it's up to the extension-using application 
>> (and middleware) to agree on a reliable way to do things, and other 
>> pieces of middleware don't need to worry about any of it.
>>
>> I guess the problem is that someone might build in a dependency, but 
>> not be careful about it, and bugs would only arise in the presence of 
>> some middleware which the author didn't test with.  It's the same 
>> issue if the author doesn't set wsgi.extensions properly, though 
>> that's more explicit and maybe harder to miss.
> 
> 
> Here's the use case I'm thinking of.  Suppose mod_python wants to expose 
> some nifty super-duper API that an application can use in place of pure 
> WSGI, if it's present.  But, this interface maybe bypasses certain 
> features that a particular piece of middleware is intended to 
> intercept.  So, my idea here is that if mod_python puts that API into a 
> key in 'wsgi.extensions', then any middleware will know it's safely 
> "intercepting communications" if it discards any 'wsgi.extensions'.
> 
> This is different from the sort of scenario you're talking about, where 
> you can have cached data include a record of its dependencies to ensure 
> correctness.
> 
> So here's the idea:
> 
>  * If you provide an alternative mechanism or extension to a 
> WSGI-supplied facility, you place it in the 'wsgi.extensions' dictionary
> 
>  * If you're middleware that simply adds additional data to the 
> 'environ', do so, recording your dependencies if any, to avoid becoming 
> "stale" if other middleware changes things
> 
>  * If you're middleware that makes changes to existing variables, or 
> intercepts any WSGI operations, do 'environ["wsgi.extensions"].clear()' 
> or delete any extensions you can't intercept, to prevent the underlying 
> application from "going around" you.
> 
> Your thoughts?

Okay, that seems reasonable.  For instance, I could imagine mod_python 
putting its Apache request object in an extension.  Something like an 
exception-catching middleware wouldn't really care about this sort of 
thing, so it wouldn't clear the extensions, but a middleware that 
filtered the output wouldn't want that extension around.

I guess a general rule would be that any extension that provided a route 
around input/output should be in wsgi.extensions, and any middleware 
that relies on input and output should clear those extensions.  Should 
that rule also apply to the other environmental variables?

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From mnot at mnot.net  Mon Aug 23 00:41:48 2004
From: mnot at mnot.net (Mark Nottingham)
Date: Mon Aug 23 00:41:52 2004
Subject: [Web-SIG] HTTP header canonicalization?
In-Reply-To: <5.1.1.6.0.20040822181654.02342a40@mail.telecommunity.com>
References: <5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com>
	<5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com>
	<5.1.1.6.0.20040822181654.02342a40@mail.telecommunity.com>
Message-ID: <73D3244D-F48C-11D8-82BE-000A95BD86C0@mnot.net>

On Aug 22, 2004, at 3:30 PM, Phillip J. Eby wrote:

> At 03:10 PM 8/22/04 -0700, Mark Nottingham wrote:
>> The only problem I'm aware of is Set-Cookie, which can have an 
>> unquoted expires date in it; e.g.,
>>
>>   Set-Cookie: CUSTOMER=WILE_E_COYOTE; path=/; expires=Wednesday, 
>> 09-Nov-99 23:12:40 GMT
>>
>> If you have two of these, the comma after the day (here, "Wednesday") 
>> makes parsing problematic.
>>
>> Note that this is only specified in the original netscape cookie spec 
>> [1], not the State Management RFC [2]. See section 10.1.2 of [2] for 
>> more discussion of this issue.
>>
>> So, you *shouldn't* see these, especially since WSGI is about the 
>> server side. All the same, I'll ask around to see how often they're 
>> still seen in the wild.
>
> Unfortunately, this seems like something that's awfully likely to be 
> present in Python frameworks "in the wild".

I'm honestly not sure. That was my assumption until recently, but I'm 
hopeful that RFC2109 may have reduced the need to accommodate this. 
Since it's a server-side framework, it can enforce conformance to the 
RFCs (there are other problems with using Expires on cookies anyway, 
esp. WRT caching) if it so chooses, as long as the application 
frameworks are willing to accept that.


>> Regarding ordering of headers with different names; I don't think so. 
>> Note that HTTP says
>>
>> """it is "good practice" to send general-header fields first, 
>> followed by request-header or response-header fields, and ending with 
>> the entity-header fields."""
>>
>> This isn't very strict, though.
>
> I was thinking that servers that want to follow "good practice" could 
> just have a list of headers in the desirable order, pulling them out 
> of the dictionary first.  In practice, *not* doing this simply means 
> that every application or framework has to know what order headers 
> "belong" in, so this doesn't seem like a terrible thing.

Agreed.


> Maybe a dictionary of lists would work?  That is, the ``headers`` 
> field would look like:
>
>     {'content-type': ['text/plain'], 'content-length': ['1234'], ...}
>
> This would be perhaps annoying for specifying simpler fields, but it 
> would still be easy to write utility functions to manipulate headers.

Would implementations be required to separate multiple header values 
into different list items?


> For the content, I'm thinking we should still prohibit embedded 
> control characters, but note that the server is allowed to "fold" long 
> header lines if it wishes (by replacing one or more whitespace 
> characters with '\r\n ').

That *may* get tricky if it does so in the middle of quoted content, 
e.g.,

Example: foo="bar
    baz"

if whitespace is significant inside the quotes.

--
Mark Nottingham     http://www.mnot.net/

From ianb at colorstudy.com  Mon Aug 23 00:59:04 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon Aug 23 00:59:07 2004
Subject: [Web-SIG] HTTP header canonicalization?
In-Reply-To: <5.1.1.6.0.20040822181654.02342a40@mail.telecommunity.com>
References: <5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com>	<5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com>
	<5.1.1.6.0.20040822181654.02342a40@mail.telecommunity.com>
Message-ID: <41292538.8070400@colorstudy.com>

Phillip J. Eby wrote:
> At 03:10 PM 8/22/04 -0700, Mark Nottingham wrote:
> 
>> The only problem I'm aware of is Set-Cookie, which can have an 
>> unquoted expires date in it; e.g.,
>>
>>   Set-Cookie: CUSTOMER=WILE_E_COYOTE; path=/; expires=Wednesday, 
>> 09-Nov-99 23:12:40 GMT
>>
>> If you have two of these, the comma after the day (here, "Wednesday") 
>> makes parsing problematic.
>>
>> Note that this is only specified in the original netscape cookie spec 
>> [1], not the State Management RFC [2]. See section 10.1.2 of [2] for 
>> more discussion of this issue.
>>
>> So, you *shouldn't* see these, especially since WSGI is about the 
>> server side. All the same, I'll ask around to see how often they're 
>> still seen in the wild.
> 
> 
> Unfortunately, this seems like something that's awfully likely to be 
> present in Python frameworks "in the wild".

I don't know if that's true.  Most (all?) frameworks have an explicit 
way of setting cookies, rather than having applications generate 
Set-Cookie headers on their own.  Since they have to be modified for 
WSGI, changing this might not be so bad.  Though right now the standard 
Cookie class does create multiple headers.

Many (most?) frameworks also use a dictionary representation for headers 
as well, sometimes with distinct methods for adding and setting headers 
(where adding creates a list of values, but only if it has to).  Several 
independent response implementations seem to work this way, so it's 
pretty common.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From pje at telecommunity.com  Mon Aug 23 01:26:49 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug 23 01:26:34 2004
Subject: [Web-SIG] Latest WSGI Draft
In-Reply-To: <4129211B.2020101@colorstudy.com>
References: <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>

At 05:41 PM 8/22/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>Alright.  Let's make it 'wsgi.run_once'.  Here's my attempt at a shorter 
>>explanation:
>>``wsgi.run_once``      This value should be true if the server/gateway
>>                        expects (but does not guarantee!) that the
>>                        application will only be invoked this one time
>>                        during the life of its containing process.
>>                        Normally, this will only be true for a gateway
>>                        based on CGI (or something similar).
>
>Is there a reason it can't be guaranteed?

Is there a reason it *should* be guaranteed?  :)  The last time we had this 
discussion (December?), I thought you'd decided that the standard library's 
"atexit" facility was sufficient to cover your use case if a guarantee was 
needed here.  (I only just remembered the "atexit" discussion, or I'd have 
suggested that as the solution instead of introducing 'wsgi.last_call' a 
few days ago.)


>>Here's the use case I'm thinking of.  Suppose mod_python wants to expose 
>>some nifty super-duper API that an application can use in place of pure 
>>WSGI, if it's present.  But, this interface maybe bypasses certain 
>>features that a particular piece of middleware is intended to 
>>intercept.  So, my idea here is that if mod_python puts that API into a 
>>key in 'wsgi.extensions', then any middleware will know it's safely 
>>"intercepting communications" if it discards any 'wsgi.extensions'.
>>This is different from the sort of scenario you're talking about, where 
>>you can have cached data include a record of its dependencies to ensure 
>>correctness.
>>So here's the idea:
>>  * If you provide an alternative mechanism or extension to a 
>> WSGI-supplied facility, you place it in the 'wsgi.extensions' dictionary
>>  * If you're middleware that simply adds additional data to the 
>> 'environ', do so, recording your dependencies if any, to avoid becoming 
>> "stale" if other middleware changes things
>>  * If you're middleware that makes changes to existing variables, or 
>> intercepts any WSGI operations, do 'environ["wsgi.extensions"].clear()' 
>> or delete any extensions you can't intercept, to prevent the underlying 
>> application from "going around" you.
>>Your thoughts?
>
>Okay, that seems reasonable.  For instance, I could imagine mod_python 
>putting its Apache request object in an extension.  Something like an 
>exception-catching middleware wouldn't really care about this sort of 
>thing, so it wouldn't clear the extensions, but a middleware that filtered 
>the output wouldn't want that extension around.
>
>I guess a general rule would be that any extension that provided a route 
>around input/output should be in wsgi.extensions, and any middleware that 
>relies on input and output should clear those extensions.  Should that 
>rule also apply to the other environmental variables?

Actually, there's another way to handle this.  Suppose we put the burden on 
server authors to provide safe extensions?  Specifically, if a server 
provides an extension that can be used in place of, or as an extension to, 
any native WSGI facility (request data, response management, environment, 
etc.), then that facility *must* respect any changes made by middleware, or 
generate an appropriate error.

An example would be that if mod_python wanted to supply its request object 
as an extension, it would have to supply a variable like 
'mod_python.get_request', which would be a callable taking 'environ' and 
'start_response'.  If any 'environ' contents supplied by mod_python had 
changed, or 'start_response' wasn't the 'start_response' it gave to the 
application, it would have to either provide an alternative object, or 
raise an error, or return None, or something of that sort.  In other words, 
the burden of verification is on the extender.

This would simplify the spec somewhat, since we wouldn't need to introduce 
'wsgi.extensions', and we can also drop the suggestion for middleware 
authors to delete extensions.  Middleware is simpler too, it just changes 
what it needs to and moves on with life.  :)  We would just have to add a 
section on how to build "safe" extensions to the spec.

From pje at telecommunity.com  Mon Aug 23 01:33:05 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug 23 01:32:49 2004
Subject: [Web-SIG] HTTP header canonicalization?
In-Reply-To: <73D3244D-F48C-11D8-82BE-000A95BD86C0@mnot.net>
References: <5.1.1.6.0.20040822181654.02342a40@mail.telecommunity.com>
	<5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com>
	<5.1.1.6.0.20040822140353.02a9ecb0@mail.telecommunity.com>
	<5.1.1.6.0.20040822181654.02342a40@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040822192723.02214ec0@mail.telecommunity.com>

At 03:41 PM 8/22/04 -0700, Mark Nottingham wrote:
>On Aug 22, 2004, at 3:30 PM, Phillip J. Eby wrote:
>>Maybe a dictionary of lists would work?  That is, the ``headers`` field 
>>would look like:
>>
>>     {'content-type': ['text/plain'], 'content-length': ['1234'], ...}
>>
>>This would be perhaps annoying for specifying simpler fields, but it 
>>would still be easy to write utility functions to manipulate headers.
>
>Would implementations be required to separate multiple header values into 
>different list items?

No.  Readers would be required to look at all list items.


>>For the content, I'm thinking we should still prohibit embedded control 
>>characters, but note that the server is allowed to "fold" long header 
>>lines if it wishes (by replacing one or more whitespace characters with 
>>'\r\n ').
>
>That *may* get tricky if it does so in the middle of quoted content, e.g.,
>
>Example: foo="bar
>    baz"
>
>if whitespace is significant inside the quotes.

I think I'm going to punt on this by saying that the server can split or 
fold headers only if it can do so *safely*, where "safely" means, "the 
server has sufficient understanding of the header's format or semantics".  :(

A possible alternative is to allow applications to fold their own headers, 
but I'm reluctant to do this because I fear people using e.g. '\n' when 
they should use '\r\n' and suchlike.  Banning control characters means the 
server can easily detect when a supplied header is broken, *and* the server 
knows it always adds a single CRLF to the end of each header.

From angryhicKclown at netscape.net  Mon Aug 23 06:44:47 2004
From: angryhicKclown at netscape.net (angryhicKclown@netscape.net)
Date: Mon Aug 23 06:44:53 2004
Subject: [Web-SIG] Comments/stylistic ideas regarding WSGI
Message-ID: <6759AAD7.5E1CC4AC.519F8DB3@netscape.net>

Now that I understand what WSGI is intended to be used for, I like it a lot. However, I do have a few suggestions.

Although it means more typing, I think the API is too cryptic as-is. I think that applications should be callable, but should have a single parameter: gateway. The gateway parameter contains attributes and methods such as environ, start_response(), and write(). This way, it's clear to the end-user both in documentation (removing many instances of "callable" and confusion with __init__) and also is very much more natural to many programmers.

Finally, I think the most important reason this change should be implemented is because it allows the interface to be easily upgraded without breaking compatibility with older versions. Perhaps (just an example), in the future, there will be a need for a flush() method, in addition to the write() method. In the current version, start_response() would return a tuple of write() and flush(), which would break current compatibility. The only other way I see of doing this using the current spec would be passing a default parameter of the version of the API used, which is ugly. With this enhancement I propose, it is simply a means of adding a method to the gateway parameter.

Here's the example as it is now:

     def simple_app(environ, start_response):
         """Simplest possible application object"""
         status = '200 OK'
         headers = [('Content-type','text/plain')]
         write = start_response(status, headers)
         write('Hello world!\n')

With my enhancements, it would now look like:

    def simple_app(gateway):
        status = '200 OK'
        headers = [('Content-type','text/plain')]
        gateway.start_response(status, headers)
        gateway.write('Hello world!\n')


In my opinion, my proposal looks a bit clearer.

My other idea (which follows the previous proposal) is to scrap start_response() entirely, and instead set gateway.status and gateway.headers attributes. The simple app would now look like:

    def simple_app(gateway):
        gateway.status = '200 OK'
        gateway.headers = [('Content-type','text/plain')] # perhaps gateway.set_header('Content-type','text/plain')?
        gateway.write('Hello world!\n')

Any comments/criticisms are appreciated.

__________________________________________________________________
Switch to Netscape Internet Service.
As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register

Netscape. Just the Net You Need.

New! Netscape Toolbar for Internet Explorer
Search from anywhere on the Web and block those annoying pop-ups.
Download now at http://channels.netscape.com/ns/search/install.jsp
From ianb at colorstudy.com  Mon Aug 23 07:24:20 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon Aug 23 07:24:24 2004
Subject: [Web-SIG] Latest WSGI Draft
In-Reply-To: <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>
References: <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
	<5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>
Message-ID: <41297F84.1090801@colorstudy.com>

Phillip J. Eby wrote:
> At 05:41 PM 8/22/04 -0500, Ian Bicking wrote:
> 
>> Phillip J. Eby wrote:
>>
>>> Alright.  Let's make it 'wsgi.run_once'.  Here's my attempt at a 
>>> shorter explanation:
>>> ``wsgi.run_once``      This value should be true if the server/gateway
>>>                        expects (but does not guarantee!) that the
>>>                        application will only be invoked this one time
>>>                        during the life of its containing process.
>>>                        Normally, this will only be true for a gateway
>>>                        based on CGI (or something similar).
>>
>>
>> Is there a reason it can't be guaranteed?
> 
> 
> Is there a reason it *should* be guaranteed?  :)  The last time we had 
> this discussion (December?), I thought you'd decided that the standard 
> library's "atexit" facility was sufficient to cover your use case if a 
> guarantee was needed here.  (I only just remembered the "atexit" 
> discussion, or I'd have suggested that as the solution instead of 
> introducing 'wsgi.last_call' a few days ago.)

atexit was a different discussion.  I don't know if there's a reason it 
should be guaranteed, but then I don't know if there's any situation 
where it wouldn't be guaranteed.  I can't imagine it being used outside 
of a CGI context, and it is guaranteed for CGI.

> Actually, there's another way to handle this.  Suppose we put the burden 
> on server authors to provide safe extensions?  Specifically, if a server 
> provides an extension that can be used in place of, or as an extension 
> to, any native WSGI facility (request data, response management, 
> environment, etc.), then that facility *must* respect any changes made 
> by middleware, or generate an appropriate error.
> 
> An example would be that if mod_python wanted to supply its request 
> object as an extension, it would have to supply a variable like 
> 'mod_python.get_request', which would be a callable taking 'environ' and 
> 'start_response'.  If any 'environ' contents supplied by mod_python had 
> changed, or 'start_response' wasn't the 'start_response' it gave to the 
> application, it would have to either provide an alternative object, or 
> raise an error, or return None, or something of that sort.  In other 
> words, the burden of verification is on the extender.

I can see that working for extensions to the request, but what about 
extensions to the response?  E.g., some mod_python extension could allow 
for internal redirects -- a useful feature that won't fit into WSGI. 
There's nothing the extension could do to check for middleware that 
would be interested, as the middleware that's interested is going to 
modify the output, not the request.

> This would simplify the spec somewhat, since we wouldn't need to 
> introduce 'wsgi.extensions', and we can also drop the suggestion for 
> middleware authors to delete extensions.  Middleware is simpler too, it 
> just changes what it needs to and moves on with life.  :)  We would just 
> have to add a section on how to build "safe" extensions to the spec.

I do like the idea of simplifying this part of the spec.  If it works. 
It's also something people can work out on their own.  I expect the vast 
majority of these servers and applications to be open source, and if 
some pieces don't work together at first there's a feedback loop to fix 
that.

Also, I don't think any of these discussions need to be resolved before 
this becomes a real PEP.  There's going to be more discussion then (no 
matter how much we discuss now), and this discussion can just be part of 
that process.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From ianb at colorstudy.com  Mon Aug 23 07:24:29 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon Aug 23 07:24:32 2004
Subject: [Web-SIG] Comments/stylistic ideas regarding WSGI
In-Reply-To: <6759AAD7.5E1CC4AC.519F8DB3@netscape.net>
References: <6759AAD7.5E1CC4AC.519F8DB3@netscape.net>
Message-ID: <41297F8D.3070907@colorstudy.com>

angryhicKclown@netscape.net wrote:
> Here's the example as it is now:
> 
>      def simple_app(environ, start_response):
>          """Simplest possible application object"""
>          status = '200 OK'
>          headers = [('Content-type','text/plain')]
>          write = start_response(status, headers)
>          write('Hello world!\n')
> 
> With my enhancements, it would now look like:
> 
>     def simple_app(gateway):
>         status = '200 OK'
>         headers = [('Content-type','text/plain')]
>         gateway.start_response(status, headers)
>         gateway.write('Hello world!\n')

That does look easier to understand.  There'd be no particular reason to 
put the input stream inside the environ dictionary either.  I assume it 
would simply be an error to use gateway.write before start_response.

> In my opinion, my proposal looks a bit clearer.
> 
> My other idea (which follows the previous proposal) is to scrap start_response() entirely, and instead set gateway.status and gateway.headers attributes. The simple app would now look like:
> 
>     def simple_app(gateway):
>         gateway.status = '200 OK'
>         gateway.headers = [('Content-type','text/plain')] # perhaps gateway.set_header('Content-type','text/plain')?
>         gateway.write('Hello world!\n')

This is harder to implement and understand.  start_response is likely to 
be an actual action on the part of the gateway, with this model you'd 
have to detect when both status and headers were set, or on the first 
call to write, or something like that.  I think an explicit 
start_response is the best idea, whether a method or function.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From jim-web-sig at jimdabell.com  Mon Aug 23 07:30:06 2004
From: jim-web-sig at jimdabell.com (Jim Dabell)
Date: Mon Aug 23 07:24:45 2004
Subject: [Web-SIG] Comments/stylistic ideas regarding WSGI
In-Reply-To: <6759AAD7.5E1CC4AC.519F8DB3@netscape.net>
References: <6759AAD7.5E1CC4AC.519F8DB3@netscape.net>
Message-ID: <200408230630.06907.jim-web-sig@jimdabell.com>

On Monday 23 August 2004 05:44, angryhicKclown@netscape.net wrote:
> Although it means more typing, I think the API is too cryptic as-is. I
> think that applications should be callable, but should have a single
> parameter: gateway. The gateway parameter contains attributes and methods
> such as environ, start_response(), and write(). This way, it's clear to the
> end-user both in documentation (removing many instances of "callable" and
> confusion with __init__) and also is very much more natural to many
> programmers.

That's the first thing I thought when skimming the draft.  Why bother moving 
tuples around when you can organise the relevent information into an object 
and simply send that around instead?  It's less to keep track of, and an 
easily extendable interface.

> My other idea (which follows the previous proposal) is to scrap
> start_response() entirely, and instead set gateway.status and
> gateway.headers attributes. The simple app would now look like:
>
>     def simple_app(gateway):
>         gateway.status = '200 OK'
>         gateway.headers = [('Content-type','text/plain')] # perhaps
> gateway.set_header('Content-type','text/plain')? gateway.write('Hello
> world!\n')

That's starting to look a lot like a mod_python handler.

My only other comment for the time being is that if the status argument to the 
start_response function was changed to an integer instead of a string, it 
would be marginally easier to compare and branch on.  A custom "reason 
phrase" that comes after the integer in the response status line can be 
provided by other means, perhaps gateway.reason_phrase, if desired.

-- 
Jim Dabell

From ianb at colorstudy.com  Mon Aug 23 07:27:06 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon Aug 23 07:27:09 2004
Subject: [Web-SIG] Comments/stylistic ideas regarding WSGI
In-Reply-To: <200408230630.06907.jim-web-sig@jimdabell.com>
References: <6759AAD7.5E1CC4AC.519F8DB3@netscape.net>
	<200408230630.06907.jim-web-sig@jimdabell.com>
Message-ID: <4129802A.2060102@colorstudy.com>

Jim Dabell wrote:
> My only other comment for the time being is that if the status argument to the 
> start_response function was changed to an integer instead of a string, it 
> would be marginally easier to compare and branch on.  A custom "reason 
> phrase" that comes after the integer in the response status line can be 
> provided by other means, perhaps gateway.reason_phrase, if desired.

I've been thinking: is there anything, anywhere, that pays any attention 
to the reason string?

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From jim-web-sig at jimdabell.com  Mon Aug 23 07:45:36 2004
From: jim-web-sig at jimdabell.com (Jim Dabell)
Date: Mon Aug 23 07:40:15 2004
Subject: [Web-SIG] Comments/stylistic ideas regarding WSGI
In-Reply-To: <4129802A.2060102@colorstudy.com>
References: <6759AAD7.5E1CC4AC.519F8DB3@netscape.net>
	<200408230630.06907.jim-web-sig@jimdabell.com>
	<4129802A.2060102@colorstudy.com>
Message-ID: <200408230645.37129.jim-web-sig@jimdabell.com>

On Monday 23 August 2004 06:27, Ian Bicking wrote:
> Jim Dabell wrote:
> > My only other comment for the time being is that if the status argument
> > to the start_response function was changed to an integer instead of a
> > string, it would be marginally easier to compare and branch on.  A custom
> > "reason phrase" that comes after the integer in the response status line
> > can be provided by other means, perhaps gateway.reason_phrase, if
> > desired.
>
> I've been thinking: is there anything, anywhere, that pays any attention
> to the reason string?

If there is, then it's broken.  According to RFC 2616, the reason string is 
intended for humans, giving localisation as an example of when it may vary.

"The Reason-Phrase is intended to give a short textual description of the 
Status-Code. The Status-Code is intended for use by automata and the 
Reason-Phrase is intended for the human user. The client is not required to 
examine or display the Reason-Phrase."

"The individual values of the numeric status codes defined for HTTP/1.1, and 
an example set of corresponding Reason-Phrase's, are presented below. The 
reason phrases listed here are only recommendations -- they MAY be replaced 
by local equivalents without affecting the protocol."

I don't think I've come across anything that pays attention to the reason 
phrase, but it's a useful reminder to developers when they are debugging 
something I suppose.

-- 
Jim Dabell

From jim-web-sig at jimdabell.com  Mon Aug 23 07:46:27 2004
From: jim-web-sig at jimdabell.com (Jim Dabell)
Date: Mon Aug 23 07:41:06 2004
Subject: [Web-SIG] Latest WSGI Draft
In-Reply-To: <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>
References: <5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
	<5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>
Message-ID: <200408230646.28097.jim-web-sig@jimdabell.com>

On Monday 23 August 2004 00:26, Phillip J. Eby wrote:
> At 05:41 PM 8/22/04 -0500, Ian Bicking wrote:
> >Phillip J. Eby wrote:
> >>Alright.  Let's make it 'wsgi.run_once'.  Here's my attempt at a shorter
> >>explanation:
> >>``wsgi.run_once``      This value should be true if the server/gateway
> >>                        expects (but does not guarantee!) that the
> >>                        application will only be invoked this one time
> >>                        during the life of its containing process.
> >>                        Normally, this will only be true for a gateway
> >>                        based on CGI (or something similar).
> >
> >Is there a reason it can't be guaranteed?
>
> Is there a reason it *should* be guaranteed?  :)

Clarity?  I don't know about anybody else, but I would assume something called 
run_once would only run once - and write code that also assumed this :).


-- 
Jim Dabell

From pje at telecommunity.com  Mon Aug 23 07:52:35 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug 23 07:52:21 2004
Subject: [Web-SIG] Comments/stylistic ideas regarding WSGI
In-Reply-To: <6759AAD7.5E1CC4AC.519F8DB3@netscape.net>
Message-ID: <5.1.1.6.0.20040823011330.0315e050@mail.telecommunity.com>

At 12:44 AM 8/23/04 -0400, angryhicKclown@netscape.net wrote:
>Now that I understand what WSGI is intended to be used for, I like it a 
>lot. However, I do have a few suggestions.
>
>Although it means more typing, I think the API is too cryptic as-is.

So, now that you understand the API, you think it's too cryptic.  :)

All kidding aside, I've made some attempts to make the spec more readable 
with respect to the various callables, as you'll see in my next draft posting.


>I think that applications should be callable, but should have a single 
>parameter: gateway. The gateway parameter contains attributes and methods 
>such as environ, start_response(), and write(). This way, it's clear to 
>the end-user both in documentation (removing many instances of "callable" 
>and confusion with __init__) and also is very much more natural to many 
>programmers.

I agree that it's more natural, but I disagree that "naturalness" is an 
important goal for the WSGI spec.  The reason is that most of WSGI's 
initial audience will be implementing exactly *one* server/gateway and/or 
application, in order to add support for it to their server or application 
framework.  They will thus have "spec in hand" when implementing.  It's 
more important that they be able to easily implement the spec.

The second audience for WSGI will be people creating "middleware" 
components, and they will appreciate the bare-bones nature of WSGI even 
more, because they will not need to implement a "gateway" class in order to 
intercept inputs, outputs, or variables.  Many fairly sophisticated pieces 
of middleware will be written as a single function (maybe with one or two 
nested functions).

Best of all, these functions will be *very* explicit as to what they are 
modifying, because they will not contain code that's needed to emulate 
functions they aren't replacing.  Using multi-functional objects like your 
"gateway" proposal means that middleware components have to implement the 
full gateway interface.


>Finally, I think the most important reason this change should be 
>implemented is because it allows the interface to be easily upgraded 
>without breaking compatibility with older versions.

Actually, the current interface includes *numerous* routes for extension, 
including additional 'wsgi.' keys, and keyword arguments to callables.


>  Perhaps (just an example), in the future, there will be a need for a 
> flush() method, in addition to the write() method. In the current 
> version, start_response() would return a tuple of write() and flush(), 
> which would break current compatibility. The only other way I see of 
> doing this using the current spec would be passing a default parameter of 
> the version of the API used, which is ugly.

It would be simple to add a 'wsgi.flush' key to the environ to supply this 
functionality, were it needed.  (Of course, flush() isn't actually needed, 
because WSGI requires write buffers to always be emptied ASAP.)


>In my opinion, my proposal looks a bit clearer.

I agree with you, but as I said, it's not a primary goal.  WSGI will rarely 
be used directly by an application developer; it's much more likely that 
you will use some other Python Web API layered atop WSGI.  In other words, 
the intended audience is developers of servers, frameworks, and 
middleware.  And most framework and server authors will only code to the 
spec once, probably with the spec in hand so they can check their 
compliance.  I think it's better for them to have an absolutely unequivocal 
spec, that's simple to implement and easy to verify the correctness 
of.  For example, did you use a dictionary?  That's a trivial yes-or-no 
thing to check, compared to, e.g., "did I implement a sufficiently 
dictionary-like object?"


>My other idea (which follows the previous proposal) is to scrap 
>start_response() entirely, and instead set gateway.status and 
>gateway.headers attributes. The simple app would now look like:
>
>     def simple_app(gateway):
>         gateway.status = '200 OK'
>         gateway.headers = [('Content-type','text/plain')] # perhaps 
> gateway.set_header('Content-type','text/plain')?
>         gateway.write('Hello world!\n')

To properly evaluate your proposal, it's inappropriate to use the 
application-side code as a basis for comparison.  Compare the *server-side* 
code, and the code needed to implement various forms of middleware.  You 
will find that the relatively small gain on the application-side code is 
*rapidly* counterbalanced by the expanding complexity of servers and 
middleware.  For example, to implement a middleware component that applies 
an XSLT stylesheet, you'll need to create a class that implements all the 
WSGI methods, and delegates the ones it doesn't need to the previous 
gateway object.  It will also need properties so it can observe the setting 
of status and headers, and delegate those as well, while tracking what it 
needs.

By comparison, the functional architecture of WSGI allows a middleware 
component to simply pass through to the next component whatever it doesn't 
need to change.  For example, a middleware component for applying an XSLT 
stylesheet would only need to define 'start_response' and 'write' 
replacements, where the 'start_response' simply munged the headers for 
content type and length, and the 'write' would pump data into the 
stylesheet mechanism, and call the old write function with any output.

These changes are clearly connected to the functionality: there is no 
overhead being added just so the next component downstream gets a more 
"object-oriented" interface.

(I'm wondering if I should add any of this to the spec, but it already has 
a paragraph in the Rationale section saying the API is intentionally 
no-frills, and another one in the Q&A saying "Why is this interface so 
low-level?".  I'm not sure how much more I can add without it seeming 
overdefensive, although I'm sure I'll get ten times as many more "why don't 
you use an object" protests once this hits c.l.py.  Oh well.)

From pje at telecommunity.com  Mon Aug 23 08:03:16 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug 23 08:03:02 2004
Subject: [Web-SIG] Latest WSGI Draft
In-Reply-To: <41297F84.1090801@colorstudy.com>
References: <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>
	<5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
	<5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com>

At 12:24 AM 8/23/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>At 05:41 PM 8/22/04 -0500, Ian Bicking wrote:
>>
>>>Phillip J. Eby wrote:
>>>
>>>>Alright.  Let's make it 'wsgi.run_once'.  Here's my attempt at a 
>>>>shorter explanation:
>>>>``wsgi.run_once``      This value should be true if the server/gateway
>>>>                        expects (but does not guarantee!) that the
>>>>                        application will only be invoked this one time
>>>>                        during the life of its containing process.
>>>>                        Normally, this will only be true for a gateway
>>>>                        based on CGI (or something similar).
>>>
>>>
>>>Is there a reason it can't be guaranteed?
>>
>>Is there a reason it *should* be guaranteed?  :)  The last time we had 
>>this discussion (December?), I thought you'd decided that the standard 
>>library's "atexit" facility was sufficient to cover your use case if a 
>>guarantee was needed here.  (I only just remembered the "atexit" 
>>discussion, or I'd have suggested that as the solution instead of 
>>introducing 'wsgi.last_call' a few days ago.)
>
>atexit was a different discussion.

Really?  I thought you were asking for something to be called upon exit as 
a way of addressing this exact same issue: i.e., the app knowing when to 
clean up after itself.


>   I don't know if there's a reason it should be guaranteed, but then I 
> don't know if there's any situation where it wouldn't be guaranteed.  I 
> can't imagine it being used outside of a CGI context, and it is 
> guaranteed for CGI.

Fine.  I just don't like it being anything other than a heuristic.  Suppose 
I'm running acceptance tests?  My CGI runner will say "you're being run 
only once", except then I'll run it again when the acceptance test tests 
another input.  But, I want the acceptance test to test the operation of 
the application when it's in "cgi mode", effectively.

So, what I'm saying is that any app that I wrote, I would want 'run_once' 
or 'last_run' or whatever it was called to *not* be a guarantee of never 
running again, but only a suggestion to "rig for infrequent running".  If 
my code *actually* relied upon it being a guarantee, then testing scenarios 
are hosed.


>I can see that working for extensions to the request, but what about 
>extensions to the response?  E.g., some mod_python extension could allow 
>for internal redirects -- a useful feature that won't fit into WSGI.

Really?  Why not?  Let's say that mod_python provides the function, the app 
calls it, doesn't call 'start_response', and doesn't return an 
iterator.  What does middleware do?  Well, presumably it does 
nothing.  Definitely it does nothing if it's an output transformer, or if 
it just adds things to the request.  So, where's the problem?

For other kinds of responses, the behavior is as I outlined before: if the 
extension is replacing the existing functionality, one should have to call 
a function to get it, passing in the existing functionality (e.g. environ 
or start_response) so that the extender can verify that critical functions 
aren't being mediated by middleware.


>>This would simplify the spec somewhat, since we wouldn't need to 
>>introduce 'wsgi.extensions', and we can also drop the suggestion for 
>>middleware authors to delete extensions.  Middleware is simpler too, it 
>>just changes what it needs to and moves on with life.  :)  We would just 
>>have to add a section on how to build "safe" extensions to the spec.
>
>I do like the idea of simplifying this part of the spec.  If it works. 
>It's also something people can work out on their own.  I expect the vast 
>majority of these servers and applications to be open source, and if some 
>pieces don't work together at first there's a feedback loop to fix that.
>
>Also, I don't think any of these discussions need to be resolved before 
>this becomes a real PEP.  There's going to be more discussion then (no 
>matter how much we discuss now), and this discussion can just be part of 
>that process.

Alas, that's all too true.  :(

From ianb at colorstudy.com  Mon Aug 23 08:53:33 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon Aug 23 08:53:39 2004
Subject: [Web-SIG] Latest WSGI Draft
In-Reply-To: <5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com>
References: <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>
	<5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
	<5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>
	<5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com>
Message-ID: <4129946D.3020408@colorstudy.com>

Phillip J. Eby wrote:
>> atexit was a different discussion.
> 
> 
> Really?  I thought you were asking for something to be called upon exit 
> as a way of addressing this exact same issue: i.e., the app knowing when 
> to clean up after itself.

I'm not really that concerned about the cleanup with CGI.  I don't think 
it's that important that the application is used only once in a CGI 
context, but then I don't think it matters much the other way either. 
But I don't see any advantage for some other server to set run_once when 
it *thinks* this is the last request (but doesn't know it for sure).  In 
fact, I don't even see a reason for wsgi.run_once if the server *knows* 
this is the last request, except in the case where the last request is 
also the first request (i.e., CGI).

I just think wsgi.run_once and CGI are the same, and there's no reason 
to state it any differently than that.  And CGI guarantees your 
application won't be rerun.  Or the spec could simply be silent on the 
matter, without stressing the issue one way or the other.

>>   I don't know if there's a reason it should be guaranteed, but then I 
>> don't know if there's any situation where it wouldn't be guaranteed.  
>> I can't imagine it being used outside of a CGI context, and it is 
>> guaranteed for CGI.
> 
> 
> Fine.  I just don't like it being anything other than a heuristic.  
> Suppose I'm running acceptance tests?  My CGI runner will say "you're 
> being run only once", except then I'll run it again when the acceptance 
> test tests another input.  But, I want the acceptance test to test the 
> operation of the application when it's in "cgi mode", effectively.

If you're running multiple unit tests in a single process, you aren't in 
CGI mode, and you shouldn't set that key.  You're in some other mode. 
If CGI mode really matters, the only test that is accurate is one where 
you are actually launching a separate process.

> So, what I'm saying is that any app that I wrote, I would want 
> 'run_once' or 'last_run' or whatever it was called to *not* be a 
> guarantee of never running again, but only a suggestion to "rig for 
> infrequent running".  If my code *actually* relied upon it being a 
> guarantee, then testing scenarios are hosed.
>
>> I can see that working for extensions to the request, but what about 
>> extensions to the response?  E.g., some mod_python extension could 
>> allow for internal redirects -- a useful feature that won't fit into 
>> WSGI.
> 
> 
> Really?  Why not?  Let's say that mod_python provides the function, the 
> app calls it, doesn't call 'start_response', and doesn't return an 
> iterator.  What does middleware do?  Well, presumably it does nothing.  
> Definitely it does nothing if it's an output transformer, or if it just 
> adds things to the request.  So, where's the problem?

Well, let's say mod_python adds two extensions.  One is to do a local 
redirect, the other is to do a recursive call.  The local redirect would 
be in wsgi.extensions (if it existed), but the recursive call would not. 
  With wsgi.extensions, the middleware would eliminate the local 
redirect, and the application would be forced to use the recursive call 
and write out the result of that.  Which is what you would want, because 
then the middleware would have an opportunity to modify the output.

I still can't think of a good way to define wsgi.extensions or give 
rules for what should go in there.  I can see some case for it, but 
since it's vague I don't think it should be included in the spec. 
There's room to add it later if it turns out to be important.

> For other kinds of responses, the behavior is as I outlined before: if 
> the extension is replacing the existing functionality, one should have 
> to call a function to get it, passing in the existing functionality 
> (e.g. environ or start_response) so that the extender can verify that 
> critical functions aren't being mediated by middleware.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From ianb at colorstudy.com  Mon Aug 23 08:59:51 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon Aug 23 08:59:54 2004
Subject: [Web-SIG] Latest WSGI Draft
In-Reply-To: <4129946D.3020408@colorstudy.com>
References: <5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>	<5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>	<5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>	<5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>	<5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com>
	<4129946D.3020408@colorstudy.com>
Message-ID: <412995E7.50605@colorstudy.com>

Ian Bicking wrote:
>> Fine.  I just don't like it being anything other than a heuristic.  
>> Suppose I'm running acceptance tests?  My CGI runner will say "you're 
>> being run only once", except then I'll run it again when the 
>> acceptance test tests another input.  But, I want the acceptance test 
>> to test the operation of the application when it's in "cgi mode", 
>> effectively.
> 
> 
> If you're running multiple unit tests in a single process, you aren't in 
> CGI mode, and you shouldn't set that key.  You're in some other mode. If 
> CGI mode really matters, the only test that is accurate is one where you 
> are actually launching a separate process.

Now that I think about it, maybe it does make sense for testing purposes 
that run_once doesn't mean that it's the last run -- it would be 
annoyingly slow to start a process for each test, and might make it hard 
to do real unit tests, but if you have a different code path when 
wsgi.run_once is true then it's important to test that.  OTOH, if I'm 
testing a project, I can make sure that my code doesn't require the 
process to terminate; code and tests are hardly decoupled after all.

Anyway, I guess I retract my concern over this issue.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From andrew at andreweland.org  Mon Aug 23 11:37:29 2004
From: andrew at andreweland.org (Andrew Eland)
Date: Mon Aug 23 11:45:56 2004
Subject: [Web-SIG] WSGI and sendfile()
Message-ID: <4129BAD9.3080104@andreweland.org>

The WSGI draft seems to be progressing well, it's great to see some 
effort at standardisation in this area.

I had a couple of thoughts:

If write() allowed an object implementing the fileno() method as a 
parameter, then an implementation is free to use the sendfile() syscall 
to efficiently send the entire contents of a file descriptor to the client.
I don't know whether others think this is useful enough functionality to 
warrant the extra implementation complexity.
If you ignore the possible efficiency gains, and sendfile() is emulated 
by the implementation, it still reduces the amount of code that needs to 
be written to serve a static file.

There's an as asymmetry in streaming. Although the use of iterators 
allows a single-threaded implementation to stream a response to many 
clients simultaneously with something like select(), it doesn't work the 
other way around. If the only access to the request body is via the 
wsgi.input stream, all reads will be blocking. Although processing many 
large uploads simultaneously isn't such a common use case when 
developing websites, it can be when developing web services.

   -- Andrew Eland (http://www.andreweland.org)
From angryhicKclown at netscape.net  Mon Aug 23 16:32:00 2004
From: angryhicKclown at netscape.net (angryhicKclown@netscape.net)
Date: Mon Aug 23 16:32:09 2004
Subject: [Web-SIG] RE: Comments/stylistic ideas regarding WSGI
Message-ID: <64804A9E.49192330.519F8DB3@netscape.net>

>Date: Mon, 23 Aug 2004 01:52:35 -0400
>From: "Phillip J. Eby" <pje@telecommunity.com>
>Subject: Re: [Web-SIG] Comments/stylistic ideas regarding WSGI
>To: angryhicKclown@netscape.net, web-sig@python.org
>Message-ID: <5.1.1.6.0.20040823011330.0315e050@mail.telecommunity.com>
>Content-Type: text/plain; charset="us-ascii"; format=flowed
>
>At 12:44 AM 8/23/04 -0400, angryhicKclown@netscape.net wrote:
>>Now that I understand what WSGI is intended to be used for, I like it a 
>>lot. However, I do have a few suggestions.
>>
>>Although it means more typing, I think the API is too cryptic as-is.
>
>So, now that you understand the API, you think it's too cryptic. ?:)
>
>All kidding aside, I've made some attempts to make the spec more readable 
>with respect to the various callables, as you'll see in my next draft posting.
>
>
>>I think that applications should be callable, but should have a single 
>>parameter: gateway. The gateway parameter contains attributes and methods 
>>such as environ, start_response(), and write(). This way, it's clear to 
>>the end-user both in documentation (removing many instances of "callable" 
>>and confusion with __init__) and also is very much more natural to many 
>>programmers.
>
>I agree that it's more natural, but I disagree that "naturalness" is an 
>important goal for the WSGI spec. ?The reason is that most of WSGI's 
>initial audience will be implementing exactly *one* server/gateway and/or 
>application, in order to add support for it to their server or application 
>framework. ?They will thus have "spec in hand" when implementing. ?It's 
>more important that they be able to easily implement the spec.

I agree, however "a callable that is passed a callable which returns a callable" could be mind-bending for some people.

>
>The second audience for WSGI will be people creating "middleware" 
>components, and they will appreciate the bare-bones nature of WSGI even 
>more, because they will not need to implement a "gateway" class in order to 
>intercept inputs, outputs, or variables. ?Many fairly sophisticated pieces 
>of middleware will be written as a single function (maybe with one or two 
>nested functions).
>
>Best of all, these functions will be *very* explicit as to what they are 
>modifying, because they will not contain code that's needed to emulate 
>functions they aren't replacing. ?Using multi-functional objects like your 
>"gateway" proposal means that middleware components have to implement the 
>full gateway interface.
>
Not neccessarily. They could extend something like this:

class Gateway(object):
    def __init__(self, parent=None):
        self.parent = parent
    def __getattribute__(self, key):
        try:
            return object.__getattribute__(self, key)
        except AttributeError:
            if self.parent != None:
                return getattr(self.parent, key)
            else:
                raise
    def write(self, data):
        raise NotImplementedError
    # ... more standard API functions here

and instantiate it with the gateway they were passed from the caller.

>
>>Finally, I think the most important reason this change should be 
>>implemented is because it allows the interface to be easily upgraded 
>>without breaking compatibility with older versions.
>
>Actually, the current interface includes *numerous* routes for extension, 
>including additional 'wsgi.' keys, and keyword arguments to callables.
>

I don't see why this can't be solved with OOP.

>
>> ?Perhaps (just an example), in the future, there will be a need for a 
>> flush() method, in addition to the write() method. In the current 
>> version, start_response() would return a tuple of write() and flush(), 
>> which would break current compatibility. The only other way I see of 
>> doing this using the current spec would be passing a default parameter of 
>> the version of the API used, which is ugly.
>
>It would be simple to add a 'wsgi.flush' key to the environ to supply this 
>functionality, were it needed. ?(Of course, flush() isn't actually needed, 
>because WSGI requires write buffers to always be emptied ASAP.)

Fair enough, however I think we're trying to solve a problem (extensions) which has already been solved by inheritance.
>
>>In my opinion, my proposal looks a bit clearer.
>
>I agree with you, but as I said, it's not a primary goal. ?WSGI will rarely 
>be used directly by an application developer; it's much more likely that 
>you will use some other Python Web API layered atop WSGI. ?In other words, 
>the intended audience is developers of servers, frameworks, and 
>middleware. ?And most framework and server authors will only code to the 
>spec once, probably with the spec in hand so they can check their 
>compliance. ?I think it's better for them to have an absolutely unequivocal 
>spec, that's simple to implement and easy to verify the correctness 
>of. ?For example, did you use a dictionary? ?That's a trivial yes-or-no 
>thing to check, compared to, e.g., "did I implement a sufficiently 
>dictionary-like object?"

"Did I override the write() method?"

>>My other idea (which follows the previous proposal) is to scrap 
>>start_response() entirely, and instead set gateway.status and 
>>gateway.headers attributes. The simple app would now look like:
>>
>> ? ? def simple_app(gateway):
>> ? ? ? ? gateway.status = '200 OK'
>> ? ? ? ? gateway.headers = [('Content-type','text/plain')] # perhaps 
>> gateway.set_header('Content-type','text/plain')?
>> ? ? ? ? gateway.write('Hello world!\n')
>
>To properly evaluate your proposal, it's inappropriate to use the 
>application-side code as a basis for comparison. ?Compare the *server-side* 
>code, and the code needed to implement various forms of middleware. ?You 
>will find that the relatively small gain on the application-side code is 
>*rapidly* counterbalanced by the expanding complexity of servers and 
>middleware. ?For example, to implement a middleware component that applies 
>an XSLT stylesheet, you'll need to create a class that implements all the 
>WSGI methods, and delegates the ones it doesn't need to the previous 
>gateway object. ?It will also need properties so it can observe the setting 
>of status and headers, and delegate those as well, while tracking what it 
>needs.

Proposal withdrawn.

>By comparison, the functional architecture of WSGI allows a middleware 
>component to simply pass through to the next component whatever it doesn't 
>need to change. ?For example, a middleware component for applying an XSLT 
>stylesheet would only need to define 'start_response' and 'write' 
>replacements, where the 'start_response' simply munged the headers for 
>content type and length, and the 'write' would pump data into the 
>stylesheet mechanism, and call the old write function with any output.
>
>These changes are clearly connected to the functionality: there is no 
>overhead being added just so the next component downstream gets a more 
>"object-oriented" interface.

OK.

>(I'm wondering if I should add any of this to the spec, but it already has 
>a paragraph in the Rationale section saying the API is intentionally 
>no-frills, and another one in the Q&A saying "Why is this interface so 
>low-level?". ?I'm not sure how much more I can add without it seeming 
>overdefensive, although I'm sure I'll get ten times as many more "why don't 
>you use an object" protests once this hits c.l.py. ?Oh well.)

I'd say you should write a short paragraph under "Questions and Answers" regarding it.

It's a great proposal thus far, I just think it's not as clean as it could be.

__________________________________________________________________
Switch to Netscape Internet Service.
As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register

Netscape. Just the Net You Need.

New! Netscape Toolbar for Internet Explorer
Search from anywhere on the Web and block those annoying pop-ups.
Download now at http://channels.netscape.com/ns/search/install.jsp
From pje at telecommunity.com  Mon Aug 23 17:37:03 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug 23 17:36:49 2004
Subject: [Web-SIG] Latest WSGI Draft
In-Reply-To: <4129946D.3020408@colorstudy.com>
References: <5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com>
	<5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>
	<5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
	<5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>
	<5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040823110736.028a9300@mail.telecommunity.com>

At 01:53 AM 8/23/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>Fine.  I just don't like it being anything other than a heuristic.
>>Suppose I'm running acceptance tests?  My CGI runner will say "you're 
>>being run only once", except then I'll run it again when the acceptance 
>>test tests another input.  But, I want the acceptance test to test the 
>>operation of the application when it's in "cgi mode", effectively.
>
>If you're running multiple unit tests in a single process, you aren't in 
>CGI mode, and you shouldn't set that key.  You're in some other mode. If 
>CGI mode really matters, the only test that is accurate is one where you 
>are actually launching a separate process.

Not if the purpose is to test the code branch that e.g. saves your sessions 
when it's run in CGI mode.

What I'm getting at here is that the purpose of this "CGI mode" is to tell 
the app to perform certain behaviors on a different heuristic 
pattern.  There's nothing about that, that requires a guarantee of being 
only run once.  I say that because a CGI application is going to get run 
more than once, anyway, so obviously whatever it does can be done more than 
once.  And, it's hard to test it if you need to run a new process every 
time it runs.


>>>I can see that working for extensions to the request, but what about 
>>>extensions to the response?  E.g., some mod_python extension could allow 
>>>for internal redirects -- a useful feature that won't fit into WSGI.
>>
>>Really?  Why not?  Let's say that mod_python provides the function, the 
>>app calls it, doesn't call 'start_response', and doesn't return an 
>>iterator.  What does middleware do?  Well, presumably it does nothing.
>>Definitely it does nothing if it's an output transformer, or if it just 
>>adds things to the request.  So, where's the problem?
>
>Well, let's say mod_python adds two extensions.  One is to do a local 
>redirect, the other is to do a recursive call.  The local redirect would 
>be in wsgi.extensions (if it existed), but the recursive call would 
>not.  With wsgi.extensions, the middleware would eliminate the local 
>redirect, and the application would be forced to use the recursive call 
>and write out the result of that.  Which is what you would want, because 
>then the middleware would have an opportunity to modify the output.

In that case, why not have the local_redirect function require the 
start_response callable as one of its parameters?  It can then refuse if 
the output has been captured by middleware.


>I still can't think of a good way to define wsgi.extensions or give rules 
>for what should go in there.  I can see some case for it, but since it's 
>vague I don't think it should be included in the spec. There's room to add 
>it later if it turns out to be important.

We don't need wsgi.extensions, we just need for servers and gateways to 
make their extension APIs middleware-safe, by verifying that things the 
APIs depend on haven't been changed by middleware.  I'll write up an 
explanation of this in the spec.


From ianb at colorstudy.com  Mon Aug 23 17:43:17 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon Aug 23 17:44:37 2004
Subject: [Web-SIG] Latest WSGI Draft
In-Reply-To: <5.1.1.6.0.20040823110736.028a9300@mail.telecommunity.com>
References: <5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com>
	<5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>
	<5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
	<5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>
	<5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com>
	<5.1.1.6.0.20040823110736.028a9300@mail.telecommunity.com>
Message-ID: <412A1095.2050603@colorstudy.com>

Phillip J. Eby wrote:
>> Well, let's say mod_python adds two extensions.  One is to do a local 
>> redirect, the other is to do a recursive call.  The local redirect 
>> would be in wsgi.extensions (if it existed), but the recursive call 
>> would not.  With wsgi.extensions, the middleware would eliminate the 
>> local redirect, and the application would be forced to use the 
>> recursive call and write out the result of that.  Which is what you 
>> would want, because then the middleware would have an opportunity to 
>> modify the output.
> 
> 
> In that case, why not have the local_redirect function require the 
> start_response callable as one of its parameters?  It can then refuse if 
> the output has been captured by middleware.

How can it tell output is going to be captured?

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From pje at telecommunity.com  Mon Aug 23 17:56:53 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug 23 17:56:39 2004
Subject: [Web-SIG] Latest WSGI Draft
In-Reply-To: <412995E7.50605@colorstudy.com>
References: <4129946D.3020408@colorstudy.com>
	<5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>
	<5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
	<5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>
	<5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com>
	<4129946D.3020408@colorstudy.com>
Message-ID: <5.1.1.6.0.20040823115633.028a0d80@mail.telecommunity.com>

At 01:59 AM 8/23/04 -0500, Ian Bicking wrote:
>Ian Bicking wrote:
>>>Fine.  I just don't like it being anything other than a heuristic.
>>>Suppose I'm running acceptance tests?  My CGI runner will say "you're 
>>>being run only once", except then I'll run it again when the acceptance 
>>>test tests another input.  But, I want the acceptance test to test the 
>>>operation of the application when it's in "cgi mode", effectively.
>>
>>If you're running multiple unit tests in a single process, you aren't in 
>>CGI mode, and you shouldn't set that key.  You're in some other mode. If 
>>CGI mode really matters, the only test that is accurate is one where you 
>>are actually launching a separate process.
>
>Now that I think about it, maybe it does make sense for testing purposes 
>that run_once doesn't mean that it's the last run -- it would be 
>annoyingly slow to start a process for each test, and might make it hard 
>to do real unit tests, but if you have a different code path when 
>wsgi.run_once is true then it's important to test that.  OTOH, if I'm 
>testing a project, I can make sure that my code doesn't require the 
>process to terminate; code and tests are hardly decoupled after all.
>
>Anyway, I guess I retract my concern over this issue.

So, leave 'wsgi.run_once' the way I last proposed it?

From ianb at colorstudy.com  Mon Aug 23 17:56:18 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon Aug 23 17:57:38 2004
Subject: [Web-SIG] Latest WSGI Draft
In-Reply-To: <5.1.1.6.0.20040823115633.028a0d80@mail.telecommunity.com>
References: <4129946D.3020408@colorstudy.com>
	<5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>
	<5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
	<5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>
	<5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com>
	<4129946D.3020408@colorstudy.com>
	<5.1.1.6.0.20040823115633.028a0d80@mail.telecommunity.com>
Message-ID: <412A13A2.2030401@colorstudy.com>

Phillip J. Eby wrote:
> So, leave 'wsgi.run_once' the way I last proposed it?

Yep.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From pje at telecommunity.com  Mon Aug 23 18:02:11 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug 23 18:01:58 2004
Subject: [Web-SIG] WSGI and sendfile()
In-Reply-To: <4129BAD9.3080104@andreweland.org>
Message-ID: <5.1.1.6.0.20040823115702.033d7c80@mail.telecommunity.com>

At 10:37 AM 8/23/04 +0100, Andrew Eland wrote:
>The WSGI draft seems to be progressing well, it's great to see some effort 
>at standardisation in this area.
>
>I had a couple of thoughts:
>
>If write() allowed an object implementing the fileno() method as a 
>parameter, then an implementation is free to use the sendfile() syscall to 
>efficiently send the entire contents of a file descriptor to the client.
>I don't know whether others think this is useful enough functionality to 
>warrant the extra implementation complexity.
>If you ignore the possible efficiency gains, and sendfile() is emulated by 
>the implementation, it still reduces the amount of code that needs to be 
>written to serve a static file.

If the use case is just to send *one* file, this could be supported by the 
application returning a file object; we could amend the spec to indicate 
that if the returned iterable has a 'fileno()' attribute, the server *may* 
use OS facilities to read data directly from the descriptor, but must still 
call the iterable's close() method, rather than closing the file descriptor.


>There's an as asymmetry in streaming. Although the use of iterators allows 
>a single-threaded implementation to stream a response to many clients 
>simultaneously with something like select(), it doesn't work the other way 
>around. If the only access to the request body is via the wsgi.input 
>stream, all reads will be blocking. Although processing many large uploads 
>simultaneously isn't such a common use case when developing websites, it 
>can be when developing web services.

Perhaps it should be mentioned that the server *is* allowed to buffer the 
input stream to e.g. a temporary file, *before* invoking the application.

From pje at telecommunity.com  Mon Aug 23 18:13:30 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug 23 18:13:14 2004
Subject: [Web-SIG] RE: Comments/stylistic ideas regarding WSGI
In-Reply-To: <64804A9E.49192330.519F8DB3@netscape.net>
Message-ID: <5.1.1.6.0.20040823120237.036d36b0@mail.telecommunity.com>

At 10:32 AM 8/23/04 -0400, angryhicKclown@netscape.net wrote:
>I agree, however "a callable that is passed a callable which returns a 
>callable" could be mind-bending for some people.

As I said, I've done some work on that.  When the next draft comes out, if 
you can provide diffs of what you'd like it to say in those spots, it'll be 
helpful.


>Not neccessarily. They could extend something like this:
>
>class Gateway(object):
>     def __init__(self, parent=None):
>         self.parent = parent
>     def __getattribute__(self, key):
>         try:
>             return object.__getattribute__(self, key)
>         except AttributeError:
>             if self.parent != None:
>                 return getattr(self.parent, key)
>             else:
>                 raise
>     def write(self, data):
>         raise NotImplementedError
>     # ... more standard API functions here
>
>and instantiate it with the gateway they were passed from the caller.

And all of that code is pure "excise"...  a tax on the implementor that 
doesn't provide *them* with any benefit.  It's what the Zope folks call a 
"dead chicken": boilerplate code that everybody has to use, but nobody 
understands, copied mindlessly from one implementation to another, subject 
to subtle bugs that then will only be fixed in some of the implementations 
and not in others.


> >>>Finally, I think the most important reason this change should be
> >>implemented is because it allows the interface to be easily upgraded
> >>without breaking compatibility with older versions.
> >
> >Actually, the current interface includes *numerous* routes for extension,
> >including additional 'wsgi.' keys, and keyword arguments to callables.
> >
>
>I don't see why this can't be solved with OOP.

Because the point of OOP is to encapsulate functions into one object; WSGI 
wants the functions to be as separate as possible so middleware can 
selectively replace functions and delegate to the old ones.

Thus, OOP does not "solve" anything here; it introduces more problems.  I'm 
ordinarily a very OOPish person, but this is one of those cases where it is 
the exact opposite of a solution.


>Fair enough, however I think we're trying to solve a problem (extensions) 
>which has already been solved by inheritance.

No, the appropriate solution is the "Chain Of Responsibility" pattern, if 
you're familiar with the GOF patterns.  It's just that since Python 
functions are first-class objects, it's trivial to implement a Chain Of 
Responsibility with functions, rather than creating several objects to each 
house one function.

From pje at telecommunity.com  Mon Aug 23 18:16:32 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug 23 18:16:15 2004
Subject: [Web-SIG] Latest WSGI Draft
In-Reply-To: <412A1095.2050603@colorstudy.com>
References: <5.1.1.6.0.20040823110736.028a9300@mail.telecommunity.com>
	<5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com>
	<5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>
	<5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040820191547.01f23da0@mail.telecommunity.com>
	<5.1.1.6.0.20040822124622.0255a1c0@mail.telecommunity.com>
	<5.1.1.6.0.20040822174902.01f284b0@mail.telecommunity.com>
	<5.1.1.6.0.20040822191020.01eda780@mail.telecommunity.com>
	<5.1.1.6.0.20040823015413.03359160@mail.telecommunity.com>
	<5.1.1.6.0.20040823110736.028a9300@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040823121344.036de8b0@mail.telecommunity.com>

At 10:43 AM 8/23/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>>Well, let's say mod_python adds two extensions.  One is to do a local 
>>>redirect, the other is to do a recursive call.  The local redirect would 
>>>be in wsgi.extensions (if it existed), but the recursive call would 
>>>not.  With wsgi.extensions, the middleware would eliminate the local 
>>>redirect, and the application would be forced to use the recursive call 
>>>and write out the result of that.  Which is what you would want, because 
>>>then the middleware would have an opportunity to modify the output.
>>
>>In that case, why not have the local_redirect function require the 
>>start_response callable as one of its parameters?  It can then refuse if 
>>the output has been captured by middleware.
>
>How can it tell output is going to be captured?

If 'start_response' is a different 'start_response' than the one it gave 
the application.  A middleware component has no need to replace 
'start_response' unless it needs to control the output in some way.  Thus, 
using any extension API that allows direct output would be bypassing 
middleware in that case.

From andrew at andreweland.org  Tue Aug 24 12:57:32 2004
From: andrew at andreweland.org (Andrew Eland)
Date: Tue Aug 24 13:06:09 2004
Subject: [Web-SIG] WSGI and sendfile()
In-Reply-To: <5.1.1.6.0.20040823115702.033d7c80@mail.telecommunity.com>
References: <5.1.1.6.0.20040823115702.033d7c80@mail.telecommunity.com>
Message-ID: <412B1F1C.4010209@andreweland.org>

Phillip J. Eby wrote:

> At 10:37 AM 8/23/04 +0100, Andrew Eland wrote:
> 
> we could amend the spec to indicate that if the returned iterable has a 
 > 'fileno()' attribute, the server *may* use OS facilities to read data 
directly
 > from the descriptor, but must still call the iterable's close() 
method, rather
> than closing the file descriptor.

That sounds fine to me.

> Perhaps it should be mentioned that the server *is* allowed to buffer 
> the input stream to e.g. a temporary file, *before* invoking the 
> application.

Another solution would be to feed the request body to the application as 
  it arrives, via some callback function. It's probably not worth the 
extra complexity, as the number of applications that stream a response 
based on incremental processing of the request body will be pretty small.

   -- Andrew (http://www.andreweland.org)


From floydophone at gmail.com  Tue Aug 24 16:52:00 2004
From: floydophone at gmail.com (Peter Hunt)
Date: Tue Aug 24 16:52:02 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <20040824100006.1E15F1E400A@bag.python.org>
References: <20040824100006.1E15F1E400A@bag.python.org>
Message-ID: <6654eac4040824075242be15dd@mail.gmail.com>

Is there a "Hello, world!" type of middleware that I could take a look at?
From pje at telecommunity.com  Tue Aug 24 17:34:55 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Aug 24 17:34:50 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <6654eac4040824075242be15dd@mail.gmail.com>
References: <20040824100006.1E15F1E400A@bag.python.org>
	<20040824100006.1E15F1E400A@bag.python.org>
Message-ID: <5.1.1.6.0.20040824113118.0329e5b0@mail.telecommunity.com>

At 10:52 AM 8/24/04 -0400, Peter Hunt wrote:
>Is there a "Hello, world!" type of middleware that I could take a look at?

How about this:

     def make_middleware(application):

         def middleware(environ, start_response):

             def extra_response(status,headers):
                 write = start_response(status, headers)
                 write('Hello world!\n')
                 return write

             return application(environ, extra_response)

     return middleware

Calling 'make_middleware(some_application)' creates a new "application" 
object that can be supplied to a server, that prepends "Hello world" to the 
body of every response issued by the original application object.

From ianb at colorstudy.com  Tue Aug 24 18:26:27 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue Aug 24 18:28:21 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <6654eac4040824075242be15dd@mail.gmail.com>
References: <20040824100006.1E15F1E400A@bag.python.org>
	<6654eac4040824075242be15dd@mail.gmail.com>
Message-ID: <412B6C33.9080102@colorstudy.com>

Peter Hunt wrote:
> Is there a "Hello, world!" type of middleware that I could take a look at?

I haven't tested this (but I'll try to tonight), but here's perhaps a 
more realistic middleware.  This compresses (with gzip) the response 
from the application (if it is allowed to):

import gzip
from cStringIO import StringIO

class gzip_middleware(object):

     def __init__(self, application, compress_level=5):
         self.application = application
         self.compress_level = compress_level

     def __call__(self, environ, start_response):
         if 'gzip' not in environ.get('HTTP_ACCEPT'):
             # nothing for us to do, so this middleware will
             # be a no-op:
             return application(environ, start_response)
         response = GzipResponse(start_response, self.compress_level)
         app_iter = self.application(environ,
                                     response.gzip_start_response)
         response.finish_response(app_iter)
         return None

class GzipResponse(object):

     def __iter__(self, start_response, compress_level):
         self.start_response = start_response
         self.compress_level = compress_level
         self.gzip_fileobj = None

     def gzip_start_response(self, status, headers):
         # This isn't part of the spec yet:
         if headers.has_key('content-encoding'):
             # we won't double-encode
             return self.start_response(status, headers)

         headers['content-encoding'] = 'gzip'
         raw_writer = self.start_response(status, headers)
         dummy_fileobj = object()
         dummy_fileobj.write = raw_writer
         self.gzip_fileobj = GzipFile('', 'wb', self.compress_level,
                                      dummy_fileobj)
         return self.gzip_fileobj.write

     def finish_response(self, app_iter):
         try:
             for s in app_iter:
                 self.gzip_fileobj.write(s)
         finally:
             if hasattr(app_iter, 'close'):
                 app_iter.close()
             self.gzip_fileobj.close()


Hmm... For a very simple filter, I actually found that surprisingly 
difficult to write.  And I think it should take advantage of its 
server's iteration, but currently it only uses the "push" (write 
function) aspect of the server.  But I'm not sure how exactly I would do 
that, especially so that the iteration actually had any beneficial 
properties.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From pje at telecommunity.com  Tue Aug 24 19:26:47 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Aug 24 19:26:34 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <412B6C33.9080102@colorstudy.com>
References: <6654eac4040824075242be15dd@mail.gmail.com>
	<20040824100006.1E15F1E400A@bag.python.org>
	<6654eac4040824075242be15dd@mail.gmail.com>
Message-ID: <5.1.1.6.0.20040824130106.02a19590@mail.telecommunity.com>

At 11:26 AM 8/24/04 -0500, Ian Bicking wrote:
>Peter Hunt wrote:
>>Is there a "Hello, world!" type of middleware that I could take a look at?
>
>I haven't tested this (but I'll try to tonight), but here's perhaps a more 
>realistic middleware.  This compresses (with gzip) the response from the 
>application (if it is allowed to):
>
>import gzip
>from cStringIO import StringIO
>
>class gzip_middleware(object):
>
>     def __init__(self, application, compress_level=5):
>         self.application = application
>         self.compress_level = compress_level
>
>     def __call__(self, environ, start_response):
>         if 'gzip' not in environ.get('HTTP_ACCEPT'):
>             # nothing for us to do, so this middleware will
>             # be a no-op:
>             return application(environ, start_response)
>         response = GzipResponse(start_response, self.compress_level)
>         app_iter = self.application(environ,
>                                     response.gzip_start_response)
>         response.finish_response(app_iter)
>         return None
>
>class GzipResponse(object):
>
>     def __iter__(self, start_response, compress_level):

I think you meant '__init__' here.


>         self.start_response = start_response
>         self.compress_level = compress_level
>         self.gzip_fileobj = None
>
>     def gzip_start_response(self, status, headers):
>         # This isn't part of the spec yet:
>         if headers.has_key('content-encoding'):
>             # we won't double-encode
>             return self.start_response(status, headers)
>
>         headers['content-encoding'] = 'gzip'
>         raw_writer = self.start_response(status, headers)
>         dummy_fileobj = object()
>         dummy_fileobj.write = raw_writer
>         self.gzip_fileobj = GzipFile('', 'wb', self.compress_level,
>                                      dummy_fileobj)
>         return self.gzip_fileobj.write
>
>     def finish_response(self, app_iter):
>         try:
>             for s in app_iter:
>                 self.gzip_fileobj.write(s)
>         finally:
>             if hasattr(app_iter, 'close'):
>                 app_iter.close()
>             self.gzip_fileobj.close()
>
>
>
>
>Hmm... For a very simple filter, I actually found that surprisingly 
>difficult to write.

Maybe because you used classes unnecessarily?


     class GzipOutput(object):
         pass

     def gzip_middleware(application, compress_level=5):

         def do_gzip(environ, start_response):

             writer = []

             if 'gzip' not in environ.get('HTTP_ACCEPT'):
                 # nothing for us to do, so this middleware will
                 # be a no-op:
                 return application(environ, start_response)

             def gzip_start_response(status, headers):
                 if 'content-encoding' in headers:
                     writer.append(start_response(status,headers))
                 else:
                     headers['content-encoding'] = gzip
                     raw_writer = start_response(status,headers)
                     dummy_fileobj = GzipOutput()
                     dummy_fileobj.write = raw_writer
                     gzip_file = GzipFile('','wb',compress_level,dummy_fileobj)
                     writer.append(gzip_file.write)
                 return writer[0]

             app_iter = application(environ,gzip_start_response)

             if app_iter and writer:
                 try:
                     map(writer[0],app_iter)
                 finally:
                     if hasattr(app_iter,'close'):
                          app_iter.close()
             else:
                 return app_iter

         return do_gzip

Hm.  That's only slightly less complicated.  Still, the only "excise" is 
handling the try/finally for the close -- virtually everything else is 
directly connected to the required functionality.  (By the way, your 
implementation tries to iterate even if the app returns None, and you can't 
set arbitrary attributes on 'object' instances.)

It may be that the PEP should contain a list of suggested utility 
functions, like this one:

     def finish_response(write_func,app_return):
         if app_return:
             try:
                 map(write_func,app_return)
             finally:
                 if hasattr(app_return,'close'):
                     app_return.close()

Such a routine would come in handy for response-munging middleware.


>   And I think it should take advantage of its server's iteration, but 
> currently it only uses the "push" (write function) aspect of the 
> server.  But I'm not sure how exactly I would do that, especially so that 
> the iteration actually had any beneficial properties.

For the given application, it's not important.  Gzipping a server push 
stream probably doesn't make a lot of sense.  :)

If you *really* want to support it, you could do something like:

    def iter_response(transformer, queue, app_return):
        for data in app_return:
            transformer(data)
            if queue:
                yield ''.join(queue)
                queue[:] = []

Where "queue" is a list appended to by the 'transformer'.  For your 
example, you could set it up like this:

     queue = []
     dummy_fileobj.write = queue.append

I'll leave the rest as an exercise for the reader.  :)

From pje at telecommunity.com  Tue Aug 24 20:07:52 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Aug 24 20:07:36 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <5.1.1.6.0.20040824130106.02a19590@mail.telecommunity.com>
References: <412B6C33.9080102@colorstudy.com>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<20040824100006.1E15F1E400A@bag.python.org>
	<6654eac4040824075242be15dd@mail.gmail.com>
Message-ID: <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>

At 01:26 PM 8/24/04 -0400, Phillip J. Eby wrote:
>At 11:26 AM 8/24/04 -0500, Ian Bicking wrote:
>>         headers['content-encoding'] = 'gzip'

>                     headers['content-encoding'] = gzip

Oops.  We both goofed: this should be:

     headers['content-encoding'] = ['gzip']

From ianb at colorstudy.com  Wed Aug 25 02:37:01 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Aug 25 02:37:07 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
References: <412B6C33.9080102@colorstudy.com>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<20040824100006.1E15F1E400A@bag.python.org>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
Message-ID: <412BDF2D.5090404@colorstudy.com>

Phillip J. Eby wrote:
> Oops.  We both goofed: this should be:
> 
>     headers['content-encoding'] = ['gzip']

Was there any resolution on how headers are going to work?  While it's 
certainly more confusing to deal with a list of headers, as opposed to a 
dictionary of headers, I feel like the whole thing is a little vague at 
this point.

Must all values be lists?  Other sequences?  Is it an error to put a 
string there?  I fear I'd see a lot of:

content-encoding: g
content-encoding: z
content-encoding: i
content-encoding: p

Must all keys be lower case?  If not, headers aren't going to be any 
easier to work with as a dictionary than as a list.  If they are 
required to be lower case, again it seems like a fragile part of the spec.

It all makes me think that it'd just be easier to write the four or so 
functions to make lists of headers easy to deal with.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From ianb at colorstudy.com  Wed Aug 25 02:49:34 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Aug 25 02:49:42 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <5.1.1.6.0.20040824130106.02a19590@mail.telecommunity.com>
References: <6654eac4040824075242be15dd@mail.gmail.com>
	<20040824100006.1E15F1E400A@bag.python.org>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<5.1.1.6.0.20040824130106.02a19590@mail.telecommunity.com>
Message-ID: <412BE21E.20504@colorstudy.com>

Phillip J. Eby wrote:
>> Hmm... For a very simple filter, I actually found that surprisingly 
>> difficult to write.
> 
> 
> Maybe because you used classes unnecessarily?

It wasn't so much the result, as the process -- keeping track of the 
state was difficult for me, with one function outside of the application 
(the application wrapper) and another inside of the application (the 
start_process wrapper).  I had to remember which function fit what part 
of the process, and how to best keep the state around, and that was more 
difficult than I expected.

>     class GzipOutput(object):
>         pass
> 
>     def gzip_middleware(application, compress_level=5):
> 
>         def do_gzip(environ, start_response):
> 
>             writer = []

Using a list to simulate mutable inner scopes is hardly what I'd 
consider a Hello World class of example!  While the trick works, it's 
not something that I would do without a compelling reason; certainly not 
just to save creating one class.

>             if 'gzip' not in environ.get('HTTP_ACCEPT'):
>                 # nothing for us to do, so this middleware will
>                 # be a no-op:
>                 return application(environ, start_response)
> 
>             def gzip_start_response(status, headers):
>                 if 'content-encoding' in headers:
>                     writer.append(start_response(status,headers))
>                 else:
>                     headers['content-encoding'] = gzip
>                     raw_writer = start_response(status,headers)
>                     dummy_fileobj = GzipOutput()
>                     dummy_fileobj.write = raw_writer
>                     gzip_file = 
> GzipFile('','wb',compress_level,dummy_fileobj)
>                     writer.append(gzip_file.write)
>                 return writer[0]
> 
>             app_iter = application(environ,gzip_start_response)
> 
>             if app_iter and writer:
>                 try:
>                     map(writer[0],app_iter)
>                 finally:
>                     if hasattr(app_iter,'close'):
>                          app_iter.close()
>             else:
>                 return app_iter
> 
>         return do_gzip
> 
> Hm.  That's only slightly less complicated.  Still, the only "excise" is 
> handling the try/finally for the close -- virtually everything else is 
> directly connected to the required functionality.  (By the way, your 
> implementation tries to iterate even if the app returns None, and you 
> can't set arbitrary attributes on 'object' instances.)
> 
> It may be that the PEP should contain a list of suggested utility 
> functions, like this one:
> 
>     def finish_response(write_func,app_return):
>         if app_return:
>             try:
>                 map(write_func,app_return)
>             finally:
>                 if hasattr(app_return,'close'):
>                     app_return.close()
> 
> Such a routine would come in handy for response-munging middleware.

I believe you also have to close the GzipFile, as it won't flush its 
final output until that happens.  So the finally block has to include 
that as well.  That makes finish_response a bit less of a win.  And 
again, map is clever but something of an abuse of the function, and not 
appropriate for any example code.

>>   And I think it should take advantage of its server's iteration, but 
>> currently it only uses the "push" (write function) aspect of the 
>> server.  But I'm not sure how exactly I would do that, especially so 
>> that the iteration actually had any beneficial properties.
> 
> 
> For the given application, it's not important.  Gzipping a server push 
> stream probably doesn't make a lot of sense.  :)

How so?

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From ianb at colorstudy.com  Wed Aug 25 03:53:08 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Aug 25 03:53:11 2004
Subject: [Web-SIG] WSGI sample applications
Message-ID: <412BF104.6010200@colorstudy.com>

I've started writing some sample code using WSGI.  So far just a working 
version of the gzip-encoder, a hello world app, the CGI server example 
from the WSGI PEP, and a small URL dispatcher.

   svn://colorstudy.com/trunk/WSGI/
   http://colorstudy.com/cgi-bin/viewcvs.cgi/trunk/WSGI/

I'll be trying to make some other applications as time goes by, kind of 
according to http://blog.colorstudy.com/ianb/weblog/2004/08/22.html#P150

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From floydophone at gmail.com  Wed Aug 25 04:56:36 2004
From: floydophone at gmail.com (Peter Hunt)
Date: Wed Aug 25 04:56:41 2004
Subject: [Web-SIG] Where do sessions fit in?
Message-ID: <6654eac40408241956694d9916@mail.gmail.com>

I've realized now how middleware works. Now, I'm wondering where
sessions would fit in. Would they be a piece of middleware, or an
extension? If so, what would the interface look like?
From ianb at colorstudy.com  Wed Aug 25 05:50:00 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Aug 25 05:50:03 2004
Subject: [Web-SIG] Where do sessions fit in?
In-Reply-To: <6654eac40408241956694d9916@mail.gmail.com>
References: <6654eac40408241956694d9916@mail.gmail.com>
Message-ID: <412C0C68.4050305@colorstudy.com>

Peter Hunt wrote:
> I've realized now how middleware works. Now, I'm wondering where
> sessions would fit in. Would they be a piece of middleware, or an
> extension? If so, what would the interface look like?

There's a good chance the session would be implemented by the 
application/framework sitting on top of WSGI, so WSGI wouldn't factor in 
at all.

But middleware could implement the session.  It would change the 
environment dictionary, adding a key (like 'middleware_name.session'), 
which would be the session object.  The new key would be an "extension" 
of sorts (at least, that's the only extension WSGI has).  The session 
object then looks like, well, whatever the middleware makes it look like.

The advantage of having it in the middleware, is that if several 
frameworks agree on an interface for the session object, it can be 
created early and then shared between all the applications, even if the 
applications otherwise work very differently.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From pje at telecommunity.com  Wed Aug 25 06:32:44 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Aug 25 06:32:38 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <412BDF2D.5090404@colorstudy.com>
References: <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
	<412B6C33.9080102@colorstudy.com>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<20040824100006.1E15F1E400A@bag.python.org>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>

At 07:37 PM 8/24/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>Oops.  We both goofed: this should be:
>>     headers['content-encoding'] = ['gzip']
>
>Was there any resolution on how headers are going to work?  While it's 
>certainly more confusing to deal with a list of headers, as opposed to a 
>dictionary of headers, I feel like the whole thing is a little vague at 
>this point.
>
>Must all values be lists?  Other sequences?  Is it an error to put a 
>string there?  I fear I'd see a lot of:
>
>content-encoding: g
>content-encoding: z
>content-encoding: i
>content-encoding: p

I was thinking lists-only, so it's an error to use a string for *any* 
header.  If it's based on some kind of semantics, it's not easily extended, 
and if there's any mixed typing it increases the chances of messing it up.


>Must all keys be lower case?

Yes.


>   If not, headers aren't going to be any easier to work with as a 
> dictionary than as a list.  If they are required to be lower case, again 
> it seems like a fragile part of the spec.
>
>It all makes me think that it'd just be easier to write the four or so 
>functions to make lists of headers easy to deal with.

You could equally well write the functions to work on the dictionary of 
lists.  ;)

OTOH, I think it's probably best if the spec is strengthened to, "the 
server *must* report an immediate error if any of the header keys contain 
non-lowercase letters, or if any values are not lists."  That would help 
flush out any programming errors.

From pje at telecommunity.com  Wed Aug 25 06:41:34 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Aug 25 06:41:17 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <412BE21E.20504@colorstudy.com>
References: <5.1.1.6.0.20040824130106.02a19590@mail.telecommunity.com>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<20040824100006.1E15F1E400A@bag.python.org>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<5.1.1.6.0.20040824130106.02a19590@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040825003303.02abd070@mail.telecommunity.com>

At 07:49 PM 8/24/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>     class GzipOutput(object):
>>         pass
>>     def gzip_middleware(application, compress_level=5):
>>         def do_gzip(environ, start_response):
>>             writer = []
>
>Using a list to simulate mutable inner scopes is hardly what I'd consider 
>a Hello World class of example!  While the trick works, it's not something 
>that I would do without a compelling reason; certainly not just to save 
>creating one class.

Hm.  To me the mutable inner scope thingy is more natural.  I'd blame it on 
my Lisp background, except I don't *have* a Lisp background...  :)


>>It may be that the PEP should contain a list of suggested utility 
>>functions, like this one:
>>     def finish_response(write_func,app_return):
>>         if app_return:
>>             try:
>>                 map(write_func,app_return)
>>             finally:
>>                 if hasattr(app_return,'close'):
>>                     app_return.close()
>>Such a routine would come in handy for response-munging middleware.
>
>I believe you also have to close the GzipFile, as it won't flush its final 
>output until that happens.  So the finally block has to include that as 
>well.  That makes finish_response a bit less of a win.  And again, map is 
>clever but something of an abuse of the function, and not appropriate for 
>any example code.

Abuse of the function?  That's what map() is *for*: to apply a function to 
each item in a sequence.  It's more compact and to the point than a list 
comprehension when all you're doing is applying a single function to a 
sequence of single arguments.  Perhaps I should also blame this on my 
imaginary Lisp background, where map is considered a 
primitive.  :)  (Actually, it's my 7 years of Python showing, since 'map()' 
was king before the advent of listcomps.)


>>For the given application, it's not important.  Gzipping a server push 
>>stream probably doesn't make a lot of sense.  :)
>
>How so?

Don't the subsequent responses have their own headers and transfer 
encodings?  (By server push I mean a multipart response, which is also the 
main scenario for calling write() more than once or yielding more than one 
value and wanting the data to be immediately flushed.

From ianb at colorstudy.com  Wed Aug 25 06:46:36 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Aug 25 06:46:41 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <5.1.1.6.0.20040825003303.02abd070@mail.telecommunity.com>
References: <5.1.1.6.0.20040824130106.02a19590@mail.telecommunity.com>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<20040824100006.1E15F1E400A@bag.python.org>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<5.1.1.6.0.20040824130106.02a19590@mail.telecommunity.com>
	<5.1.1.6.0.20040825003303.02abd070@mail.telecommunity.com>
Message-ID: <412C19AC.6000500@colorstudy.com>

Phillip J. Eby wrote:
>> I believe you also have to close the GzipFile, as it won't flush its 
>> final output until that happens.  So the finally block has to include 
>> that as well.  That makes finish_response a bit less of a win.  And 
>> again, map is clever but something of an abuse of the function, and 
>> not appropriate for any example code.
> 
> 
> Abuse of the function?  That's what map() is *for*: to apply a function 
> to each item in a sequence.  It's more compact and to the point than a 
> list comprehension when all you're doing is applying a single function 
> to a sequence of single arguments.  Perhaps I should also blame this on 
> my imaginary Lisp background, where map is considered a primitive.  :)  
> (Actually, it's my 7 years of Python showing, since 'map()' was king 
> before the advent of listcomps.)

Map (and list comprehensions) imply you are doing something with the 
results, but you aren't, you just want to throw the results away.

>>> For the given application, it's not important.  Gzipping a server 
>>> push stream probably doesn't make a lot of sense.  :)
>>
>>
>> How so?
> 
> 
> Don't the subsequent responses have their own headers and transfer 
> encodings?  (By server push I mean a multipart response, which is also 
> the main scenario for calling write() more than once or yielding more 
> than one value and wanting the data to be immediately flushed.

Oh, now I'm confused.  By push I just meant where the application pushes 
data to the server (the write callable) vs. the case where the server 
pulls from the application (the iterable).

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From ianb at colorstudy.com  Wed Aug 25 06:52:28 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Aug 25 06:52:32 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>
References: <5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
	<412B6C33.9080102@colorstudy.com>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<20040824100006.1E15F1E400A@bag.python.org>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
	<5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>
Message-ID: <412C1B0C.4020107@colorstudy.com>

Phillip J. Eby wrote:
> I was thinking lists-only, so it's an error to use a string for *any* 
> header.  If it's based on some kind of semantics, it's not easily 
> extended, and if there's any mixed typing it increases the chances of 
> messing it up.
> 
> 
>> Must all keys be lower case?
> 
> 
> Yes.
[...]
> OTOH, I think it's probably best if the spec is strengthened to, "the 
> server *must* report an immediate error if any of the header keys 
> contain non-lowercase letters, or if any values are not lists."  That 
> would help flush out any programming errors.

All of these requirements make me wary.  It's not that hard to deal with 
a list of headers, and we don't have to make any of these requirements, 
and if the server doesn't check something you won't get bizarre bugs 
(like four content-encoding fields).  Keys can be any case, all values 
will always be strings (which aren't compound, and so people aren't 
likely to mess up).  The issues with a dictionary are just too great, 
without significant gain.  I'd be okay if we used a dictionary-like 
object that enforced these requirements, kind of like rfc822 defines, 
but that doesn't seem to be the direction WSGI is going.

I've been writing my middleware using lists of headers, and it's really 
not a problem.  There are some other annoyances, but that isn't one of 
them.  I'll write about the annoyances later, once I've actually got it 
all working, but those relate to other parts of the system.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From pje at telecommunity.com  Wed Aug 25 07:00:34 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Aug 25 07:00:41 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <412C1B0C.4020107@colorstudy.com>
References: <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>
	<5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
	<412B6C33.9080102@colorstudy.com>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<20040824100006.1E15F1E400A@bag.python.org>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
	<5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040825005443.02ac8060@mail.telecommunity.com>

At 11:52 PM 8/24/04 -0500, Ian Bicking wrote:
>I'd be okay if we used a dictionary-like object that enforced these 
>requirements, kind of like rfc822 defines, but that doesn't seem to be the 
>direction WSGI is going.

If there's an implementation already available in the stdlib for 2.2 and 
up, that's not constantly in flux (like the 'email' package), I'd consider 
it.  I just *really* don't want another long thread about what the methods 
should be named and what their precise semantics should be.  :)

In the meantime, I'm fine with headers remaining as they were in the 
previous draft: i.e. a sequence of tuples.

From pje at telecommunity.com  Wed Aug 25 07:22:28 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Aug 25 07:22:12 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <5.1.1.6.0.20040825005443.02ac8060@mail.telecommunity.com>
References: <412C1B0C.4020107@colorstudy.com>
	<5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>
	<5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
	<412B6C33.9080102@colorstudy.com>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<20040824100006.1E15F1E400A@bag.python.org>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
	<5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com>

At 01:00 AM 8/25/04 -0400, Phillip J. Eby wrote:
>At 11:52 PM 8/24/04 -0500, Ian Bicking wrote:
>>I'd be okay if we used a dictionary-like object that enforced these 
>>requirements, kind of like rfc822 defines, but that doesn't seem to be 
>>the direction WSGI is going.
>
>If there's an implementation already available in the stdlib for 2.2 and 
>up, that's not constantly in flux (like the 'email' package), I'd consider 
>it.  I just *really* don't want another long thread about what the methods 
>should be named and what their precise semantics should be.  :)
>
>In the meantime, I'm fine with headers remaining as they were in the 
>previous draft: i.e. a sequence of tuples.

Hm.  Looking at 'email.Message', actually, it has all the semantics needed 
for header management, and it looks like the interface at least is stable 
across 2.2 and 2.3 (I haven't checked 2.4.)

The code is relatively brief, and I think I'd be okay with using it as the 
type for 'headers'.  Anybody have any objections?  Here's sample usage:

     from email.Message import Message

     def application(env, start):
         headers = Message()
         headers.set_type("text/plain")
         headers.add_header("Set-Cookie", "CUSTOMER=WILE_E_COYOTE", 
path="/foobar")
         start("200 OK", headers)("Hello world!")

One of the nice things about it is that it makes it easier to do MIME and 
HTTP headers that have parameter info.

From ianb at colorstudy.com  Wed Aug 25 08:25:33 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Aug 25 08:25:38 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com>
References: <412C1B0C.4020107@colorstudy.com>
	<5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>
	<5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
	<412B6C33.9080102@colorstudy.com>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<20040824100006.1E15F1E400A@bag.python.org>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
	<5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>
	<5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com>
Message-ID: <412C30DD.4060401@colorstudy.com>

Phillip J. Eby wrote:
> Hm.  Looking at 'email.Message', actually, it has all the semantics 
> needed for header management, and it looks like the interface at least 
> is stable across 2.2 and 2.3 (I haven't checked 2.4.)
> 
> The code is relatively brief, and I think I'd be okay with using it as 
> the type for 'headers'.  Anybody have any objections?  Here's sample usage:
> 
>     from email.Message import Message
> 
>     def application(env, start):
>         headers = Message()
>         headers.set_type("text/plain")
>         headers.add_header("Set-Cookie", "CUSTOMER=WILE_E_COYOTE", 
> path="/foobar")
>         start("200 OK", headers)("Hello world!")
> 
> One of the nice things about it is that it makes it easier to do MIME 
> and HTTP headers that have parameter info.

Seems like an appropriate object.  This part certainly should be stable, 
since they are deprecating mimetools and rfc822, with email replacing those.

At first it seemed a little annoying that content-type was handled 
differently, but because it's the one required header it actually seems 
pretty reasonable.

It seems like there are a couple things that are a little inappropriate 
for HTTP: multipart, unifrom, attach, payload, filename, boundary, 
preamble, epilogue.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From ianb at colorstudy.com  Wed Aug 25 08:39:43 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Aug 25 08:39:47 2004
Subject: [Web-SIG] WSGI: catching exception
Message-ID: <412C342F.70203@colorstudy.com>

I wrote a couple pieces of middleware that catch exceptions.  One 
defines exceptions like HTTPTemporaryRedirect, so you can raise those 
exceptions and it catches them and turns it into a proper HTTP response. 
  The other catches all unhandled exceptions and formats them with 
cgitb.  (Obviously the two have to be nested in the right order)

Anyway, it felt difficult to handle exceptions, for two reasons:

One place is around the application invocation, looks like:

try:
     return application(environ, start_response)
except:
     blah blah

Except "blah blah" almost certainly depends on whether start_response 
has been called, so it knows if it has to call start_response, or just 
deal with a partially completed response.  So I had to wrap 
start_response in another function that detected if it had been called. 
  This also would create what I believe is a false negative if you were 
comparing start_response at different points in the request, as we 
discussed for certain output-shortcutting extensions.

So, it would be nice if there was an easier way to tell where in the 
request we were, i.e., if headers had been sent.

The other hard part is dealing with the iterator.  I had to wrap the 
iterator, with something like:

def wrap_iter(app_iter):
     try:
         for s in app_iter:
             yield s
     except:
         blah blah

There we know headers have been sent.  But it's a bit annoying that the 
except has to be done twice.  I was also getting some behavior I have 
yet to understand when I was nesting gzipper and cgitb_catcher, using a 
URL like:

.../WSGI/dispatch.cgi/cgitb_catcher.middle/gzipper.middle/httpexceptions.middle/echo?error=iter

This is using the modules in svn://colorstudy.com/trunk/WSGI (symlinking 
dispatch.py to dispatch.cgi).  Actually, now that I look at it, I think 
it's an issue with gzipper not dealing well with exceptions, though I 
guess that's another exception issue.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From tony at lownds.com  Wed Aug 25 17:05:42 2004
From: tony at lownds.com (tony@lownds.com)
Date: Wed Aug 25 17:22:58 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com>
References: <412C1B0C.4020107@colorstudy.com><5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com><5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com><412B6C33.9080102@colorstudy.com><6654eac4040824075242be15dd@mail.gmail.com><20040824100006.1E15F1E400A@bag.python.org><6654eac4040824075242be15dd@mail.gmail.com><5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com><5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>
	<5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com>
Message-ID: <49996.67.124.88.63.1093446342.squirrel@*>

>>In the meantime, I'm fine with headers remaining as they were in the
>>previous draft: i.e. a sequence of tuples.
>

+1

> Hm.  Looking at 'email.Message', actually, it has all the semantics needed
> for header management, and it looks like the interface at least is stable
> across 2.2 and 2.3 (I haven't checked 2.4.)
>
> The code is relatively brief, and I think I'd be okay with using it as the
> type for 'headers'.  Anybody have any objections?  Here's sample usage:
>

It's a nice idea, and it would probably simplify both server and
application code
and the spec. But, it forces an implementation. I think inclusion in the
PEP as a possible
change before 1.0, will give the idea plenty of discussion time.

>      from email.Message import Message
>
>      def application(env, start):
>          headers = Message()
>          headers.set_type("text/plain")
>          headers.add_header("Set-Cookie", "CUSTOMER=WILE_E_COYOTE",
> path="/foobar")
>          start("200 OK", headers)("Hello world!")

Just call the items() method, and WSGI remains the same

          start("200 OK", headers.items())("Hello world!")

>
> One of the nice things about it is that it makes it easier to do MIME and
> HTTP headers that have parameter info.
>

One issue: after the m.set_type call, an extra MIME-Version: 1.0 header is
present.
According to the HTTP 1.1 spec, when that header is present, whatever the
server sends must be in "full compliance" with the MIME protocol. From
reading the MIME spec, I guess adding Content-transfer-encoding: binary
would take care of that...

-Tony


From pje at telecommunity.com  Wed Aug 25 17:55:39 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Aug 25 17:55:27 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <412C30DD.4060401@colorstudy.com>
References: <5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com>
	<412C1B0C.4020107@colorstudy.com>
	<5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>
	<5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
	<412B6C33.9080102@colorstudy.com>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<20040824100006.1E15F1E400A@bag.python.org>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
	<5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>
	<5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040825115314.02c17140@mail.telecommunity.com>

At 01:25 AM 8/25/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>Hm.  Looking at 'email.Message', actually, it has all the semantics 
>>needed for header management, and it looks like the interface at least is 
>>stable across 2.2 and 2.3 (I haven't checked 2.4.)
>>The code is relatively brief, and I think I'd be okay with using it as 
>>the type for 'headers'.  Anybody have any objections?  Here's sample usage:
>>     from email.Message import Message
>>     def application(env, start):
>>         headers = Message()
>>         headers.set_type("text/plain")
>>         headers.add_header("Set-Cookie", "CUSTOMER=WILE_E_COYOTE", 
>> path="/foobar")
>>         start("200 OK", headers)("Hello world!")
>>One of the nice things about it is that it makes it easier to do MIME and 
>>HTTP headers that have parameter info.
>
>Seems like an appropriate object.  This part certainly should be stable, 
>since they are deprecating mimetools and rfc822, with email replacing those.
>
>At first it seemed a little annoying that content-type was handled 
>differently, but because it's the one required header it actually seems 
>pretty reasonable.

Actually, there's nothing stopping you from using the normal features to 
manipulate content-type; but 'set_type()' is more convenient.


>It seems like there are a couple things that are a little inappropriate 
>for HTTP: multipart, unifrom, attach, payload, filename, boundary, 
>preamble, epilogue.

I don't really see an issue there; if need be we can list the "approved" 
methods.

From pje at telecommunity.com  Wed Aug 25 18:04:28 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Aug 25 18:04:11 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <49996.67.124.88.63.1093446342.squirrel@*>
References: <5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com>
	<412C1B0C.4020107@colorstudy.com>
	<5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>
	<5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
	<412B6C33.9080102@colorstudy.com>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<20040824100006.1E15F1E400A@bag.python.org>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
	<5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>
	<5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040825115721.02c188c0@mail.telecommunity.com>

At 08:05 AM 8/25/04 -0700, tony@lownds.com wrote:
> >>In the meantime, I'm fine with headers remaining as they were in the
> >>previous draft: i.e. a sequence of tuples.
> >
>
>+1
>
> > Hm.  Looking at 'email.Message', actually, it has all the semantics needed
> > for header management, and it looks like the interface at least is stable
> > across 2.2 and 2.3 (I haven't checked 2.4.)
> >
> > The code is relatively brief, and I think I'd be okay with using it as the
> > type for 'headers'.  Anybody have any objections?  Here's sample usage:
> >
>
>It's a nice idea, and it would probably simplify both server and
>application code
>and the spec. But, it forces an implementation.

But it's available in the standard library, and therefore will be *one* 
implementation, and thus have only one set of bugs to work around per 
Python version.  :)


>Just call the items() method, and WSGI remains the same
>
>           start("200 OK", headers.items())("Hello world!")

Quite so.  But turning the items back into headers is more complex, if 
middleware wants to manipulate them, e.g.:

     for n,v in headers:
         msg.add_header(n,v)

In any case, email.Message is actually a very thin wrapper over a list of 
name,value pairs!  It just provides the needed functionality to manipulate 
the headers.


>One issue: after the m.set_type call, an extra MIME-Version: 1.0 header is
>present.
>According to the HTTP 1.1 spec, when that header is present, whatever the
>server sends must be in "full compliance" with the MIME protocol. From
>reading the MIME spec, I guess adding Content-transfer-encoding: binary
>would take care of that...

We could require the server to add a c-t-e header if it's missing and 
MIME-Version is present, i.e.:

     if ('MIME-Version' in headers and
         'Content-Transfer-Encoding' not in headers
     ):
         headers['Content-Transfer-Encoding'] = "whatever"

From ianb at colorstudy.com  Wed Aug 25 18:05:52 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed Aug 25 18:07:29 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <49996.67.124.88.63.1093446342.squirrel@*>
References: <412C1B0C.4020107@colorstudy.com><5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com><5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com><412B6C33.9080102@colorstudy.com><6654eac4040824075242be15dd@mail.gmail.com><20040824100006.1E15F1E400A@bag.python.org><6654eac4040824075242be15dd@mail.gmail.com><5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com><5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>	<5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com>
	<49996.67.124.88.63.1093446342.squirrel@*>
Message-ID: <412CB8E0.7040502@colorstudy.com>

tony@lownds.com wrote:
> It's a nice idea, and it would probably simplify both server and
> application code
> and the spec. But, it forces an implementation. I think inclusion in the
> PEP as a possible
> change before 1.0, will give the idea plenty of discussion time.

I agree, I don't think this need to be resolved before making it an 
official PEP.

> 
>>     from email.Message import Message
>>
>>     def application(env, start):
>>         headers = Message()
>>         headers.set_type("text/plain")
>>         headers.add_header("Set-Cookie", "CUSTOMER=WILE_E_COYOTE",
>>path="/foobar")
>>         start("200 OK", headers)("Hello world!")
> 
> 
> Just call the items() method, and WSGI remains the same
> 
>           start("200 OK", headers.items())("Hello world!")
> 
> 
>>One of the nice things about it is that it makes it easier to do MIME and
>>HTTP headers that have parameter info.
>>
> 
> 
> One issue: after the m.set_type call, an extra MIME-Version: 1.0 header is
> present.
> According to the HTTP 1.1 spec, when that header is present, whatever the
> server sends must be in "full compliance" with the MIME protocol. From
> reading the MIME spec, I guess adding Content-transfer-encoding: binary
> would take care of that...

Is it only after set_type then, not add_header('content-type',...)? 
Adding that header implicitly is rather annoying.

It's too bad there's not a simpler superclass to email.Message that 
implements just the header part, and not the email/MIME part.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From pje at telecommunity.com  Wed Aug 25 18:54:50 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Aug 25 18:54:33 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <50583.67.124.88.63.1093450925.squirrel@*>
References: <5.1.1.6.0.20040825115721.02c188c0@mail.telecommunity.com>
	<5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com>
	<412C1B0C.4020107@colorstudy.com>
	<5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>
	<5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
	<412B6C33.9080102@colorstudy.com>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<20040824100006.1E15F1E400A@bag.python.org>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
	<5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>
	<5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com>
	<5.1.1.6.0.20040825115721.02c188c0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040825124719.0231c280@mail.telecommunity.com>

(Tony: I'm assuming this was intended for web-sig; discussions like this 
should be archived "for the record".  I hope that's not an issue for you.)


At 09:22 AM 8/25/04 -0700, tony@lownds.com wrote:
> >
> >>Just call the items() method, and WSGI remains the same
> >>
> >>           start("200 OK", headers.items())("Hello world!")
> >
> > Quite so.  But turning the items back into headers is more complex, if
> > middleware wants to manipulate them, e.g.:
> >
> >      for n,v in headers:
> >          msg.add_header(n,v)
> >
>
>Yes, that is an advantage. But applications with their own header
>manipulation library would need to do that as well, if a Message()
>instance was required.

But how many of those applications currently use a list of key,value pairs 
as their data structure?  They're going to have to loop over whatever they 
actually use, or build it up piece by piece, or do whatever it is that they do.

I'm pretty much assuming that current apps/frameworks will need to generate 
a 'headers' structure, so as long as it's a simple loop, it doesn't much 
matter what goes in the body of that loop.

Also, frameworks are usually going to have only one place to create WSGI 
headers, but middleware is by definition intended to be stacked.  And, it's 
more likely that a single author will write multiple middleware components, 
than WSGI wrappers for multiple frameworks.  So, simplifying the job of 
middleware authors, if it doesn't significantly burden framework authors, 
is a good thing here, I think.  (Since the only framework authors who would 
be burdened by the change are those who already use  a precisely compliant 
data structure; everyone else had to write a loop anyway.)


> > We could require the server to add a c-t-e header if it's missing and
> > MIME-Version is present, i.e.:
> >
> >      if ('MIME-Version' in headers and
> >          'Content-Transfer-Encoding' not in headers
> >      ):
> >          headers['Content-Transfer-Encoding'] = "whatever"
> >
> >
>
>I think that warning applications about this implication of set_type would
>be sufficient.
>
>Content-transfer-encoding is assumed to be 7bit if not present, its not
>required by MIME. 7bit would be wrong for a lot of HTTP responses though.

In the current spec, the server is already required to ensure validity of 
the headers; this would just be a specific mention of one example of that.

From pje at telecommunity.com  Wed Aug 25 19:04:28 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed Aug 25 19:04:11 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <412CB8E0.7040502@colorstudy.com>
References: <49996.67.124.88.63.1093446342.squirrel@*>
	<412C1B0C.4020107@colorstudy.com>
	<5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>
	<5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
	<412B6C33.9080102@colorstudy.com>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<20040824100006.1E15F1E400A@bag.python.org>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
	<5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>
	<5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com>
	<49996.67.124.88.63.1093446342.squirrel@*>
Message-ID: <5.1.1.6.0.20040825125547.0231e0d0@mail.telecommunity.com>

At 11:05 AM 8/25/04 -0500, Ian Bicking wrote:
>tony@lownds.com wrote:
>>It's a nice idea, and it would probably simplify both server and
>>application code
>>and the spec. But, it forces an implementation. I think inclusion in the
>>PEP as a possible
>>change before 1.0, will give the idea plenty of discussion time.
>
>I agree, I don't think this need to be resolved before making it an 
>official PEP.

I'll mark it as an "Open Issue" in the PEP, providing sample code to show 
how it's used.

Might as well have *something* left for folks to argue about.  Maybe it'll 
provide a nice distraction from PEP 318.  :)


>Is it only after set_type then, not add_header('content-type',...)? Adding 
>that header implicitly is rather annoying.

Yeah, but not hard for the server to fix, either.  While I dislike forcing 
either side to have any "boilerplate" code, there will be fewer 
servers/gateways than middleware and frameworks, and being able to use 
email.Message should make response header manipulation as easy for 
middleware as request header manipulation is now.

From tony at lownds.com  Wed Aug 25 19:35:31 2004
From: tony at lownds.com (tony@lownds.com)
Date: Wed Aug 25 19:52:36 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <5.1.1.6.0.20040825124719.0231c280@mail.telecommunity.com>
References: <5.1.1.6.0.20040825115721.02c188c0@mail.telecommunity.com><5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com><412C1B0C.4020107@colorstudy.com><5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com><5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com><412B6C33.9080102@colorstudy.com><6654eac4040824075242be15dd@mail.gmail.com><20040824100006.1E15F1E400A@bag.python.org><6654eac4040824075242be15dd@mail.gmail.com><5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com><5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com><5.1.1.6.0.20040825010507.02a50a40@mail.telecommunity.com><5.1.1.6.0.20040825115721.02c188c0@mail.telecommunity.com>
	<5.1.1.6.0.20040825124719.0231c280@mail.telecommunity.com>
Message-ID: <51123.204.162.121.54.1093455331.squirrel@*>

> (Tony: I'm assuming this was intended for web-sig; discussions like this
> should be archived "for the record".  I hope that's not an issue for you.)
>

Simple oversight, sorry!

>  So, simplifying the job of
> middleware authors, if it doesn't significantly burden framework authors,
> is a good thing here, I think.  (Since the only framework authors who
> would
> be burdened by the change are those who already use  a precisely compliant
> data structure; everyone else had to write a loop anyway.)
>

I agree with that. I liked the simplicity and non-mutability of a sequence
of tuples. Look forward to hearing what a wider audience thinks, after
PEPing!

>>Content-transfer-encoding is assumed to be 7bit if not present, its not
>>required by MIME. 7bit would be wrong for a lot of HTTP responses though.
>
> In the current spec, the server is already required to ensure validity of
> the headers; this would just be a specific mention of one example of that.
>

Ok

-Tony

From brsizer at kylotan.eidosnet.co.uk  Thu Aug 26 20:30:22 2004
From: brsizer at kylotan.eidosnet.co.uk (Ben Sizer)
Date: Thu Aug 26 20:28:52 2004
Subject: [Web-SIG] Regarding the WSGI draft
Message-ID: <412E2C3E.7000900@kylotan.eidosnet.co.uk>

I've read through the draft and most of the messages on this list that 
followed it. However, I have a basic problem with it which I will 
attempt to summarise below.

The focus seems to be on making frameworks more portable. The abstract 
reads "This document specifies a proposed standard interface between web 
servers and Python web applications or frameworks, to promote web 
application portability across a variety of web servers." This is all 
well and good, but the implications from that point onwards are that 
we're firmly dealing with frameworks rather than applications. Phillip 
J. Eby has commented on Ian Bicking's blog that "at this stage, the 
benefits of WSGI are primarily for web *framework* authors, and web 
*server* authors, not web *application* authors. This is *not* an 
application API, it's a framework-to-server glue API."

This immediately strikes me as odd, because from my previous development 
experience frameworks are not that important. In fact, I'm heavily 
inclined to believe that Python only has a proliferation of frameworks 
because of the currently poor degree of higher level support for web 
development in general, and the various frameworks attempt to bridge 
that gap. Create better general web support for Python, and frameworks 
will only be necessary for the really heavy duty applications. Create 
the ability to make frameworks more portable, and all you do is 
encourage more people to develop more frameworks. Focusing on making 
life easier for framework developers is solving the wrong problem, in my 
opinion.

I come from an ASP and PHP background and generally speaking, a 
developer doesn't want or need a framework between their code and the 
web-scripting language when developing on those platforms. On the rare 
occasions that you do use a framework (such as PHP-Nuke) it's because 
you want to simplify high level activities like news management and user 
lists, and allow people to add content without needing to know HTML. By 
contrast Python's frameworks tend to address the trivial, low level 
things that should fall under the 'batteries included' philosophy that 
Python subscribes to.

The front page of this Python Web-SIG suggests, "pick a Web framework 
that already exists, make a functionality checklist from it, and add 
that functionality to a new webserver module." I think that's what is 
needed most of all - some sort of standard approach that new Python 
programmers can jump right in and use, which doesn't require choosing 
one of several different frameworks.

What I'd like to see is something mirroring the Python Database API. For 
instance, I might have to change "import MySQLdb" to "import pyPgSQL" 
but I know that 99% of the rest of the database code will work fine. As 
a web developer I would like to be able to change "import cgi" to 
"import mod_python" or "import fastcgi" and know that, if I follow a 
standard set of calls, I will have a simple and standard way of 
producing a web document. The standardised access to the output and 
input streams in the current draft is all well and good but there's 
little point in me making use of that abstraction if I still have to 
rely on extra modules for access to useful higher-level concepts such as:

- dispatching control flow based on the URI
- session management and cookies
- GET/query string parsing
- POST/form parsing
- ASP + PHP style templating

If these things are coming soon in future WSGI drafts, then great! But I 
got the impression that these features were being delegated out to the 
legion of frameworks.

I am aware that this all sounds very negative, and I don't mean to 
criticise the hard work that Phillip and others have put into this draft 
specification. I just worry that it diverts attention from what I 
consider to be the real issue facing Python on the web, which is making 
life easier for web application developers, not framework developers.

-- 
Ben Sizer
From mnot at mnot.net  Thu Aug 26 20:51:06 2004
From: mnot at mnot.net (Mark Nottingham)
Date: Thu Aug 26 20:51:11 2004
Subject: [Web-SIG] Regarding the WSGI draft
In-Reply-To: <412E2C3E.7000900@kylotan.eidosnet.co.uk>
References: <412E2C3E.7000900@kylotan.eidosnet.co.uk>
Message-ID: <E332D1A0-F790-11D8-82BE-000A95BD86C0@mnot.net>

Hi Ben,

I understand where you're coming from, but I think we're in a different 
situation here. There are a lot of different ways
that you can construct an application framework; there is no "one true 
way," because people have varying requirements for a Web application.

Contrast this with databases, which are for the most part a commodity; 
you can plug in different databases because they all have the same 
conceptual model of how a database works.

There has been some progress towards convergence on a common view of 
what a Web application is, but I still think we have a ways to go, and 
much to learn, before any one application framework can declare 
victory.

That being the case, WSGI provides something that's incredibly 
valuable; as long as it maintains the right level of abstraction, it 
allows application frameworks to avoid worrying about the details of a 
particular server implementation.

I'm pleased as punch with it, because it lets me avoid doing that when 
I write my own application framework (details forthcoming ;).

Cheers,


On Aug 26, 2004, at 11:30 AM, Ben Sizer wrote:

> I've read through the draft and most of the messages on this list that 
> followed it. However, I have a basic problem with it which I will 
> attempt to summarise below.
>
> The focus seems to be on making frameworks more portable. The abstract 
> reads "This document specifies a proposed standard interface between 
> web servers and Python web applications or frameworks, to promote web 
> application portability across a variety of web servers." This is all 
> well and good, but the implications from that point onwards are that 
> we're firmly dealing with frameworks rather than applications. Phillip 
> J. Eby has commented on Ian Bicking's blog that "at this stage, the 
> benefits of WSGI are primarily for web *framework* authors, and web 
> *server* authors, not web *application* authors. This is *not* an 
> application API, it's a framework-to-server glue API."
>
> This immediately strikes me as odd, because from my previous 
> development experience frameworks are not that important. In fact, I'm 
> heavily inclined to believe that Python only has a proliferation of 
> frameworks because of the currently poor degree of higher level 
> support for web development in general, and the various frameworks 
> attempt to bridge that gap. Create better general web support for 
> Python, and frameworks will only be necessary for the really heavy 
> duty applications. Create the ability to make frameworks more 
> portable, and all you do is encourage more people to develop more 
> frameworks. Focusing on making life easier for framework developers is 
> solving the wrong problem, in my opinion.
>
> I come from an ASP and PHP background and generally speaking, a 
> developer doesn't want or need a framework between their code and the 
> web-scripting language when developing on those platforms. On the rare 
> occasions that you do use a framework (such as PHP-Nuke) it's because 
> you want to simplify high level activities like news management and 
> user lists, and allow people to add content without needing to know 
> HTML. By contrast Python's frameworks tend to address the trivial, low 
> level things that should fall under the 'batteries included' 
> philosophy that Python subscribes to.
>
> The front page of this Python Web-SIG suggests, "pick a Web framework 
> that already exists, make a functionality checklist from it, and add 
> that functionality to a new webserver module." I think that's what is 
> needed most of all - some sort of standard approach that new Python 
> programmers can jump right in and use, which doesn't require choosing 
> one of several different frameworks.
>
> What I'd like to see is something mirroring the Python Database API. 
> For instance, I might have to change "import MySQLdb" to "import 
> pyPgSQL" but I know that 99% of the rest of the database code will 
> work fine. As a web developer I would like to be able to change 
> "import cgi" to "import mod_python" or "import fastcgi" and know that, 
> if I follow a standard set of calls, I will have a simple and standard 
> way of producing a web document. The standardised access to the output 
> and input streams in the current draft is all well and good but 
> there's little point in me making use of that abstraction if I still 
> have to rely on extra modules for access to useful higher-level 
> concepts such as:
>
> - dispatching control flow based on the URI
> - session management and cookies
> - GET/query string parsing
> - POST/form parsing
> - ASP + PHP style templating
>
> If these things are coming soon in future WSGI drafts, then great! But 
> I got the impression that these features were being delegated out to 
> the legion of frameworks.
>
> I am aware that this all sounds very negative, and I don't mean to 
> criticise the hard work that Phillip and others have put into this 
> draft specification. I just worry that it diverts attention from what 
> I consider to be the real issue facing Python on the web, which is 
> making life easier for web application developers, not framework 
> developers.
>
> -- 
> Ben Sizer
> _______________________________________________
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: 
> http://mail.python.org/mailman/options/web-sig/mnot%40mnot.net
>

--
Mark Nottingham     http://www.mnot.net/

From brsizer at kylotan.eidosnet.co.uk  Thu Aug 26 21:46:07 2004
From: brsizer at kylotan.eidosnet.co.uk (Ben Sizer)
Date: Thu Aug 26 21:48:28 2004
Subject: [Web-SIG] Regarding the WSGI draft
In-Reply-To: <E332D1A0-F790-11D8-82BE-000A95BD86C0@mnot.net>
References: <412E2C3E.7000900@kylotan.eidosnet.co.uk>
	<E332D1A0-F790-11D8-82BE-000A95BD86C0@mnot.net>
Message-ID: <412E3DFF.2000605@kylotan.eidosnet.co.uk>

Mark Nottingham wrote:

> I understand where you're coming from, but I think we're in a different 
> situation here. There are a lot of different ways
> that you can construct an application framework; there is no "one true 
> way," because people have varying requirements for a Web application.

...

> There has been some progress towards convergence on a common view of 
> what a Web application is, but I still think we have a ways to go, and 
> much to learn, before any one application framework can declare victory.

Although what you say makes sense on the surface, the fact remains that 
technologies such as ASP and PHP are popular and useful because they 
present a simple and standard interface to the user, whether that user 
is writing a 4 line script, a small application, or a large framework 
upon which to base other applications. With Python you seem stuck with 
two equally unappealing options: slow CGI if you want a simple script, 
where simple is relative since you need to fool around with os.environ, 
printing your own headers, etc - or a complex and idiosyncratic 
framework if you want anything non-trivial, but which is often just as 
complex as PHP straight out of the box, except with a much smaller user 
base and generally less documentation.

For example, you know that $_GET[varName] is going to be the standard 
way of accessing a querystring variable in PHP. Yet in Python it could 
be part of a request.form dictionary, or 
cgi.parse_qs(os.environ['QUERY_STRING']), or 
modpython.util.parse_qs(req.parsed_uri[7]), etc. Yet we know that query 
strings are part of the RFC2396 standard, so why not have a standard 
module or interface to present to the user?

I don't see any good reason for this sort of variance, except that 
there's a bias towards accommodating these existing frameworks rather 
than enabling simpler applications of the future, and which I think is a 
symptom of the problem rather than part of the solution.

-- 
Ben Sizer.


From pje at telecommunity.com  Thu Aug 26 21:59:07 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Aug 26 21:58:48 2004
Subject: [Web-SIG] Regarding the WSGI draft
In-Reply-To: <412E2C3E.7000900@kylotan.eidosnet.co.uk>
Message-ID: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>

At 07:30 PM 8/26/04 +0100, Ben Sizer wrote:
>I just worry that it diverts attention from what I consider to be the real 
>issue facing Python on the web, which is making life easier for web 
>application developers, not framework developers.

Unfortunately, every effort to date to create a "framework to end all 
frameworks" has simply resulted in the existence of framework 
N+1.  Why?  Because the creation of a *new* framework means that there is 
no existing code that uses it.  And if the framework only provides features 
that others already have, there's no compelling reason to switch.

Any approach that ignores the economic reality of present-day Python web 
apps, and provides no way for them to migrate gradually to a new standard, 
is doomed to niche status at best.  (Comparison to ASP and PHP is 
misleading: both had standards for dispatching, sessions, cookies, form 
parsing, and templating *when they were created*, so there was no legacy 
codebase using alternative solutions that had to be migrated.)

And so, the only way we're going to "steal" the marketshare of existing 
frameworks is with the consent and co-operation of the developers of those 
frameworks.  That means there has to be enough benefit for them to justify 
the effort of getting on board.

So, please allow me to reveal my top-secret plan for total world 
domination...  :)

First, the current situation.  Choice of framework is a high investment for 
users, because once they choose, they are stuck with that framework and 
possibly server.  The cost to switch is extremely high.  It's almost as 
though every plumbing manufacturer makes their own sizes of pipes and 
connectors, so once you choose a vendor, you're stuck with them.

WSGI changes this scenario by introducing competitive pressure to the 
server/framework choice.  As soon as enough framework and server developers 
participate, the others are pushed by network effects to do the 
same.  Users ask, "Why can't I use your framework in any WSGI server?" and 
"Why can't I use any WSGI framework in your server?", pushing the slower 
adopters to either join up or be marginalized.

But this is just the first phase: standardizing on a size for one kind of 
pipe.  It's not very glamorous, but it fundamentally changes the 
marketplace, and causes many things to appear to spontaneously happen "on 
their own".

First, users can experiment with other frameworks, especially if those 
frameworks are lightweight.  This builds competitive pressure in the 
direction of lightweight, easy-to-integrate frameworks.  So framework 
developers begin to break their monolithic approaches down into smaller 
pieces that operate on segments of WSGI.  For example, a session service 
that you pass the incoming 'environ' and outgoing 'headers' to, in order 
for it to read and set cookies.  (Notice that this *isn't* a WSGI-defined 
or standardized service, just a service implemented *in terms of* WSGI.)

Such a service makes little sense to implement today, but people will 
spontaneously begin developing such services once WSGI is a ubiquitous part 
of the Python web development landscape.  It's the most natural thing in 
the world for them to do so, not only because it means a wider audience for 
their service, but because they're likely developing it for a WSGI-based 
environment they're already using.  What other platform would they write it 
for?

Because these services will be interchangeable to some degree, lock-in is 
limited and competition will determine a winner or winners.  Then, if the 
winners are sufficiently similar to allow useful standardization, that's 
the natural next step.  But, for some services, the differences will be 
important qualitative differences, and standardization would reduce 
meaningful choice.  We don't know in advance what these services should be, 
and we don't know enough to standardize on them now.

For someone with an ASP or PHP background, that last statement at least 
might sound like sheer lunacy.  But, Python web frameworks have often 
pioneered techniques years ahead of their appearance in ASP, PHP, and Java 
frameworks.  I would hate for us to lose our next great innovation to 
premature standardization.

But luckily, I don't need to worry: there's simply no way you'll get enough 
Python framework developers (and their users) to agree on such a 
standardization.  For one thing, it's not in their best interests to do 
so.  (Don't let me discourage you from trying, though, if that's what you 
want to do.  I just don't think you'll have much success, and am not 
interested in trying it myself.)


Anyway, there it is.  My secret plan to fundamentally alter the Python web 
programming universe through secret mind-control market manipulation and 
social engineering.  You found me out.  Now I'll have to kill you.*  :)


* "And I'd have gotten away with it too, if it hadn't been for those 
meddling kids..."


(Disclaimer for non-US readers: the above is a humorous reference to an 
American TV cartoon that featured a different character saying this line 
each week, after their nefarious plans were foiled.  It's not me calling 
anybody a meddling kid, or threatening to actually kill anyone!)

From rjkimble at alum.mit.edu  Thu Aug 26 22:11:38 2004
From: rjkimble at alum.mit.edu (Bob Kimble)
Date: Thu Aug 26 22:11:47 2004
Subject: [Web-SIG] Regarding the WSGI draft
In-Reply-To: <412E3DFF.2000605@kylotan.eidosnet.co.uk>
References: <412E2C3E.7000900@kylotan.eidosnet.co.uk>
	<E332D1A0-F790-11D8-82BE-000A95BD86C0@mnot.net>
	<412E3DFF.2000605@kylotan.eidosnet.co.uk>
Message-ID: <200408261611.38389.rjkimble@alum.mit.edu>

On Thursday 26 August 2004 03:46 pm, Ben Sizer wrote:
> Mark Nottingham wrote:
> > I understand where you're coming from, but I think we're in a different
> > situation here. There are a lot of different ways
> > that you can construct an application framework; there is no "one true
> > way," because people have varying requirements for a Web application.
>
> ...
>
> > There has been some progress towards convergence on a common view of
> > what a Web application is, but I still think we have a ways to go, and
> > much to learn, before any one application framework can declare victory.
>
> Although what you say makes sense on the surface, the fact remains that
> technologies such as ASP and PHP are popular and useful because they
> present a simple and standard interface to the user, whether that user
> is writing a 4 line script, a small application, or a large framework
> upon which to base other applications. With Python you seem stuck with
> two equally unappealing options: slow CGI if you want a simple script,
> where simple is relative since you need to fool around with os.environ,
> printing your own headers, etc - or a complex and idiosyncratic
> framework if you want anything non-trivial, but which is often just as
> complex as PHP straight out of the box, except with a much smaller user
> base and generally less documentation.
>
> For example, you know that $_GET[varName] is going to be the standard
> way of accessing a querystring variable in PHP. Yet in Python it could
> be part of a request.form dictionary, or
> cgi.parse_qs(os.environ['QUERY_STRING']), or
> modpython.util.parse_qs(req.parsed_uri[7]), etc. Yet we know that query
> strings are part of the RFC2396 standard, so why not have a standard
> module or interface to present to the user?
>
> I don't see any good reason for this sort of variance, except that
> there's a bias towards accommodating these existing frameworks rather
> than enabling simpler applications of the future, and which I think is a
> symptom of the problem rather than part of the solution.

I have been reading this thread for a while now, and I haven't commented 
because I have done absolutely no web development using Python. However, 
Mark's comments strike me as being dead on. I'm used to the Java Servlet API, 
which creates an API for servlets and JSP pages. The fact that there are 
several high quality application servers that all support this API suggests 
to me that creating something similar for Python makes a lot of sense. I have 
written JSP's and servlets and run them under Tomcat, but I know that I could 
just as easily run them under WebSphere, WebLogic, JRun, or any others that 
support the API. It seems to me that creating a similar API for Python would 
be terrific. Of course, somebody would also have to write an application 
server to support the API, but I suspect some of the existing frameworks 
could be revamped to support it. Anyway, that's my 2 cents. I would love to 
see something similar to Tomcat and the Java Servlet API for Python.
From titus at caltech.edu  Thu Aug 26 22:25:10 2004
From: titus at caltech.edu (Titus Brown)
Date: Thu Aug 26 22:23:03 2004
Subject: [Web-SIG] Regarding the WSGI draft
In-Reply-To: <200408261611.38389.rjkimble@alum.mit.edu>
References: <412E2C3E.7000900@kylotan.eidosnet.co.uk>
	<E332D1A0-F790-11D8-82BE-000A95BD86C0@mnot.net>
	<412E3DFF.2000605@kylotan.eidosnet.co.uk>
	<200408261611.38389.rjkimble@alum.mit.edu>
Message-ID: <20040826202510.GA5704@caltech.edu>

-> I have been reading this thread for a while now, and I haven't commented 
-> because I have done absolutely no web development using Python. However, 
-> Mark's comments strike me as being dead on. I'm used to the Java Servlet API, 
-> which creates an API for servlets and JSP pages. The fact that there are 
-> several high quality application servers that all support this API suggests 
-> to me that creating something similar for Python makes a lot of sense. I have 
-> written JSP's and servlets and run them under Tomcat, but I know that I could 
-> just as easily run them under WebSphere, WebLogic, JRun, or any others that 
-> support the API. It seems to me that creating a similar API for Python would 
-> be terrific. Of course, somebody would also have to write an application 
-> server to support the API, but I suspect some of the existing frameworks 
-> could be revamped to support it. Anyway, that's my 2 cents. I would love to 
-> see something similar to Tomcat and the Java Servlet API for Python.

<delurk>

I've implemented packages at the adapter level (PyWX), the framework
level (crud that was never released because I found Quixote first), and
the content level (based variously on CGI, WebWare, and Quixote).

I'm moderately skeptical of the short term use of the API being
developed on this list, because in practice it is relatively easy
to implement a framework that fits on top of all of the existing
adapters (CGI, mod_python, etc.)  Medium term, I think it will lead
to a welcome homogenization of server <--> adapter <--> framework
interaction, and so I think it's a valuable concept.

The idea of having a single framework (like Java's "servlets") is, I
think, silly.  Having implemented sites in several of the existing
frameworks, it is clear that there are several different ways to
conceptualize the development of Web sites: the Quixote style and
the WebWare style are two very distinct examples.  Anything that cuts
down on the variety of available frameworks is going to restrict the
options, which is bad.

However, I think it is incumbent upon the developers and users of the
different frameworks to clearly distinguish between the various options.
Right now it is very confusing to me, and I've been developing Web sites
in Python for 5 years ;).

I'm very confused as to why you need multiple servlet implementations in
Java.  Wouldn't one do just as well as 10?  It sounds like having 5
different implementations of the 'os' module in Python...

--titus
From pje at telecommunity.com  Thu Aug 26 22:45:01 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu Aug 26 22:44:39 2004
Subject: [Web-SIG] Regarding the WSGI draft
In-Reply-To: <20040826202510.GA5704@caltech.edu>
References: <200408261611.38389.rjkimble@alum.mit.edu>
	<412E2C3E.7000900@kylotan.eidosnet.co.uk>
	<E332D1A0-F790-11D8-82BE-000A95BD86C0@mnot.net>
	<412E3DFF.2000605@kylotan.eidosnet.co.uk>
	<200408261611.38389.rjkimble@alum.mit.edu>
Message-ID: <5.1.1.6.0.20040826163917.01efdec0@mail.telecommunity.com>

At 01:25 PM 8/26/04 -0700, Titus Brown wrote:
>in practice it is relatively easy
>to implement a framework that fits on top of all of the existing
>adapters (CGI, mod_python, etc.)

How about FastCGI?  Medusa?  Twisted?  ZServer?  SCGI and PCGI?  ReadyExec?

It's only "relatively easy" in that you can define your own WSGI-like 
protocol, make adapters for some subset of the existing servers and 
gateways, and then document that protocol.  There's no sense in duplicating 
those efforts for each framework and each server or gateway in an N*M 
explosion, especially since the coverage in practice is quite incomplete, 
despite being "relatively easy" in principle.

From fumanchu at amor.org  Thu Aug 26 22:50:12 2004
From: fumanchu at amor.org (Robert Brewer)
Date: Thu Aug 26 22:55:40 2004
Subject: [Web-SIG] Regarding the WSGI draft
Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022E8C@exchange.hqamor.amorhq.net>

Phillip J. Eby wrote:
> First, the current situation.  Choice of framework is a high 
> investment for users, because once they choose, they are stuck
> with that framework and possibly server.  The cost to switch
> is extremely high.  It's almost as though every plumbing
> manufacturer makes their own sizes of pipes and connectors,
> so once you choose a vendor, you're stuck with them.
> 
> WSGI changes this scenario by introducing competitive pressure to the 
> server/framework choice.  As soon as enough framework and 
> server developers participate, the others are pushed by network
> effects to do the same.  Users ask, "Why can't I use your
> framework in any WSGI server?" and "Why can't I use any WSGI
> framework in your server?", pushing the slower adopters to either
> join up or be marginalized.

It's on my to-do list for *my* framework already... ;)

just-a-data-point-in-support-ly-yrs,


Robert Brewer
MIS
Amor Ministries
fumanchu@amor.org
From mnot at mnot.net  Thu Aug 26 23:27:58 2004
From: mnot at mnot.net (Mark Nottingham)
Date: Thu Aug 26 23:28:02 2004
Subject: [Web-SIG] Re: Latest WSGI Draft (Phillip J. Eby)
In-Reply-To: <5.1.1.6.0.20040825005443.02ac8060@mail.telecommunity.com>
References: <5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>
	<5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
	<412B6C33.9080102@colorstudy.com>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<20040824100006.1E15F1E400A@bag.python.org>
	<6654eac4040824075242be15dd@mail.gmail.com>
	<5.1.1.6.0.20040824140613.0239bec0@mail.telecommunity.com>
	<5.1.1.6.0.20040825002614.02ab8dd0@mail.telecommunity.com>
	<5.1.1.6.0.20040825005443.02ac8060@mail.telecommunity.com>
Message-ID: <CD3EC390-F7A6-11D8-82BE-000A95BD86C0@mnot.net>


On Aug 24, 2004, at 10:00 PM, Phillip J. Eby wrote:

> In the meantime, I'm fine with headers remaining as they were in the 
> previous draft: i.e. a sequence of tuples.

+1 to this or the email.Message solution; there are lots of different 
ways to add value to the way that headers are exposed, but let's keep 
it simple and conservative in WSGI.

Cheers,


--
Mark Nottingham     http://www.mnot.net/

From ianb at colorstudy.com  Fri Aug 27 00:08:37 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Aug 27 00:10:40 2004
Subject: [Web-SIG] Regarding the WSGI draft
In-Reply-To: <412E2C3E.7000900@kylotan.eidosnet.co.uk>
References: <412E2C3E.7000900@kylotan.eidosnet.co.uk>
Message-ID: <412E5F65.9080508@colorstudy.com>

Responding more generally on this thread; or, more generally, here's 
What The WSGI Means To Me...

It's not so much that you can attach servers and frameworks 
independently.  That's nice, but it's not a huge deal.  WSGI is, to me, 
the beginning of a common language about HTTP requests, a standard way 
to represent that request.  It's not the most awesome, easiest to use 
representation of these objects, but I don't think that's a reasonable 
goal, those qualities are too subjective.  WSGI's request and response 
are what we can manage, trying to make everyone happy.

And it's not so bad, because while it's not featureful, it's *really 
simple*.  That's a decent compromise.  The request is the environment 
dictionary the WSGI defines; the response is the status plus headers 
plus written body plus iterable body.  And it's okay that it's this 
simple, because it's a straight-forward mapping of HTTP with little 
information lost, and HTTP is obviously fairly central to this all.

But even though it's simple and adds no real features, nor does it 
enable anything new, it's still interesting because it gives us a 
standard way of communicating (programmatically).  We don't have that 
right now.  Ben's right, there's a lot of work to be done to make a 
good, simple, Python web development environment.  WSGI makes it 
possible to work towards that goal incrementally and in a distributed 
fashion, without competing.  Right now everyone who develops on a 
framework is competing with everyone developing on some other framework. 
It's just too big of a problem space to have to compete on a large 
scale, with the entire environment being take it or leave it.  But I 
don't think the developers actually *want* to compete, it's just been a 
technical necessity.

So, a bit like Phillip, I think WSGI isn't an end to itself, but it 
could be key in enabling further progress.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From mnot at mnot.net  Fri Aug 27 00:57:17 2004
From: mnot at mnot.net (Mark Nottingham)
Date: Fri Aug 27 00:57:22 2004
Subject: [Web-SIG] Other kinds of environment variables
Message-ID: <479BBC96-F7B3-11D8-82BE-000A95BD86C0@mnot.net>

One thing that seems to be missing in WSGI to me is the communication 
of the delineation between what the server does and what the 
application does.

The latest drafts says;

[[[
In general, the server or gateway is responsible for ensuring that 
correct headers are sent to the client: if an application omits a 
needed header, the server or gateway *shoud* add it. [...] If the 
application supplies a header that the server would ordinarily supply, 
or that contradicts the server's intended behaviour [...] the server or 
gateway *may* discard the conflicting header, provided that its action 
is recorded for the benefit of the application author.
]]]

I'm a bit uncomfortable with this, because there's no standard way for 
the action to be "recorded for the benefit of the application author." 
IMO this is one of the major problems with CGI.

In other words, there's a laundry list of HTTP features that may or may 
not be handled by the server on behalf of the application, depending on 
how it's written and configured. Giving the application some idea of 
what it can expect the server to do, and how it will do it, would help 
application frameworks decide what tasks it needs to take on itself.

For example;

* HTTP auth - does the server make the Authentication header available? 
Automatically generate 401s when configured to require auth? If the 
application framework wishes to perform auth on its own, will it have 
the appropriate information available?
* chunked encoding - does the server chunk the body when appropriate?
* content-length - does the server automatically calculate it?
* cache validation - does the server handle If-Modified-Since and 
If-None-Match requests appropriately (e.g., with a 304)?
* content-encoding - does the server apply content-encoding in requests 
and/or responses as appropriate, and what schemes does it support?
* transfer-encoding - same as content-encoding

Some servers (e.g., CGI) may not be able to supply all of this 
information reliably, but others will, and it would be quite useful to 
frameworks to know the capabilities of the server in a generic fashion.

I know that this can be addressed by server-specific environment now, 
but I think there might be some low-hanging fruit for common functions 
like the ones above. It might be that they'd be better in a separate 
document, so they're not part of the 'core' WSGI, but I think there's 
real value in having some common ones.

Thoughts? If there's interest, I'll make a proposal.


--
Mark Nottingham     http://www.mnot.net/

From mnot at mnot.net  Fri Aug 27 01:03:36 2004
From: mnot at mnot.net (Mark Nottingham)
Date: Fri Aug 27 01:03:39 2004
Subject: [Web-SIG] expect/continue
Message-ID: <291B974B-F7B4-11D8-82BE-000A95BD86C0@mnot.net>

Phillip,

I like the Expect/Continue langauge in the latest draft -- thanks!

One thing; the first bullet point gives servers and gateways the option 
of "Reject[ing] all client requests containing an Expect: 100-continue 
header with a '417 Expectation failed' error."

This doesn't seem like a good thing to allow, because it makes server 
implementations that take this path reject ALL requests that use 
expect/continue, with no recourse. The intent of Expect/Continue is 
that it should fall back to normal operation (the request gets sent and 
processed) unless it is explicitly rejected.

So, I think this option should be removed. I can see some scenarios 
where the server can and will be configured to reject all requests over 
a certain size, etc. but rejecting all requests that use this mechanism 
indiscriminately doesn't seem to fall into that case. If an 
implementation doesn't want to deal with expect/continue at all, it has 
two choices;

1) don't claim to be HTTP/1.1 conformant

2) wait until the client decides you don't support expect/continue, and 
sends the request body (this is suboptimal, for obvious reasons).

Cheers,

--
Mark Nottingham     http://www.mnot.net/

From floydophone at gmail.com  Fri Aug 27 01:19:24 2004
From: floydophone at gmail.com (Peter Hunt)
Date: Fri Aug 27 01:19:34 2004
Subject: [Web-SIG] Re: Web-SIG Digest, Vol 10, Issue 26
In-Reply-To: <20040826202307.7DCE81E400A@bag.python.org>
References: <20040826202307.7DCE81E400A@bag.python.org>
Message-ID: <6654eac404082616195fc15079@mail.gmail.com>

I believe we can achieve the best of both worlds.

We should implement a Servlet-like interface which works atop WSGI,
which includes session management, caching, and pooling. We should
include this in the standard Python distribution and call it the
official framework.

This servlet library should have the exact same interface as a
currently existing framework. At first glance, I'd say we should just
port jonpy to WSGI and include it in the Python distribution, but
other viable alternatives are WebWare servlets, Snakelets, and
WebStack.

What do you think?

On Thu, 26 Aug 2004 22:23:07 +0200 (CEST), web-sig-request@python.org
<web-sig-request@python.org> wrote:
> Send Web-SIG mailing list submissions to
>        web-sig@python.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://mail.python.org/mailman/listinfo/web-sig
> or, via email, send a message with subject or body 'help' to
>        web-sig-request@python.org
> 
> You can reach the person managing the list at
>        web-sig-owner@python.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Web-SIG digest..."
> 
> Today's Topics:
> 
>   1. Regarding the WSGI draft (Ben Sizer)
>   2. Re: Regarding the WSGI draft (Mark Nottingham)
>   3. Re: Regarding the WSGI draft (Ben Sizer)
>   4. Re: Regarding the WSGI draft (Phillip J. Eby)
>   5. Re: Regarding the WSGI draft (Bob Kimble)
>   6. Re: Regarding the WSGI draft (Titus Brown)
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Thu, 26 Aug 2004 19:30:22 +0100
> From: Ben Sizer <brsizer@kylotan.eidosnet.co.uk>
> Subject: [Web-SIG] Regarding the WSGI draft
> To: web-sig@python.org
> Message-ID: <412E2C3E.7000900@kylotan.eidosnet.co.uk>
> Content-Type: text/plain; charset=us-ascii; format=flowed
> 
> I've read through the draft and most of the messages on this list that
> followed it. However, I have a basic problem with it which I will
> attempt to summarise below.
> 
> The focus seems to be on making frameworks more portable. The abstract
> reads "This document specifies a proposed standard interface between web
> servers and Python web applications or frameworks, to promote web
> application portability across a variety of web servers." This is all
> well and good, but the implications from that point onwards are that
> we're firmly dealing with frameworks rather than applications. Phillip
> J. Eby has commented on Ian Bicking's blog that "at this stage, the
> benefits of WSGI are primarily for web *framework* authors, and web
> *server* authors, not web *application* authors. This is *not* an
> application API, it's a framework-to-server glue API."
> 
> This immediately strikes me as odd, because from my previous development
> experience frameworks are not that important. In fact, I'm heavily
> inclined to believe that Python only has a proliferation of frameworks
> because of the currently poor degree of higher level support for web
> development in general, and the various frameworks attempt to bridge
> that gap. Create better general web support for Python, and frameworks
> will only be necessary for the really heavy duty applications. Create
> the ability to make frameworks more portable, and all you do is
> encourage more people to develop more frameworks. Focusing on making
> life easier for framework developers is solving the wrong problem, in my
> opinion.
> 
> I come from an ASP and PHP background and generally speaking, a
> developer doesn't want or need a framework between their code and the
> web-scripting language when developing on those platforms. On the rare
> occasions that you do use a framework (such as PHP-Nuke) it's because
> you want to simplify high level activities like news management and user
> lists, and allow people to add content without needing to know HTML. By
> contrast Python's frameworks tend to address the trivial, low level
> things that should fall under the 'batteries included' philosophy that
> Python subscribes to.
> 
> The front page of this Python Web-SIG suggests, "pick a Web framework
> that already exists, make a functionality checklist from it, and add
> that functionality to a new webserver module." I think that's what is
> needed most of all - some sort of standard approach that new Python
> programmers can jump right in and use, which doesn't require choosing
> one of several different frameworks.
> 
> What I'd like to see is something mirroring the Python Database API. For
> instance, I might have to change "import MySQLdb" to "import pyPgSQL"
> but I know that 99% of the rest of the database code will work fine. As
> a web developer I would like to be able to change "import cgi" to
> "import mod_python" or "import fastcgi" and know that, if I follow a
> standard set of calls, I will have a simple and standard way of
> producing a web document. The standardised access to the output and
> input streams in the current draft is all well and good but there's
> little point in me making use of that abstraction if I still have to
> rely on extra modules for access to useful higher-level concepts such as:
> 
> - dispatching control flow based on the URI
> - session management and cookies
> - GET/query string parsing
> - POST/form parsing
> - ASP + PHP style templating
> 
> If these things are coming soon in future WSGI drafts, then great! But I
> got the impression that these features were being delegated out to the
> legion of frameworks.
> 
> I am aware that this all sounds very negative, and I don't mean to
> criticise the hard work that Phillip and others have put into this draft
> specification. I just worry that it diverts attention from what I
> consider to be the real issue facing Python on the web, which is making
> life easier for web application developers, not framework developers.
> 
> --
> Ben Sizer
> 
> ------------------------------
> 
> Message: 2
> Date: Thu, 26 Aug 2004 11:51:06 -0700
> From: Mark Nottingham <mnot@mnot.net>
> Subject: Re: [Web-SIG] Regarding the WSGI draft
> To: Ben Sizer <brsizer@kylotan.eidosnet.co.uk>
> Cc: web-sig@python.org
> Message-ID: <E332D1A0-F790-11D8-82BE-000A95BD86C0@mnot.net>
> Content-Type: text/plain; charset=US-ASCII; format=flowed
> 
> Hi Ben,
> 
> I understand where you're coming from, but I think we're in a different
> situation here. There are a lot of different ways
> that you can construct an application framework; there is no "one true
> way," because people have varying requirements for a Web application.
> 
> Contrast this with databases, which are for the most part a commodity;
> you can plug in different databases because they all have the same
> conceptual model of how a database works.
> 
> There has been some progress towards convergence on a common view of
> what a Web application is, but I still think we have a ways to go, and
> much to learn, before any one application framework can declare
> victory.
> 
> That being the case, WSGI provides something that's incredibly
> valuable; as long as it maintains the right level of abstraction, it
> allows application frameworks to avoid worrying about the details of a
> particular server implementation.
> 
> I'm pleased as punch with it, because it lets me avoid doing that when
> I write my own application framework (details forthcoming ;).
> 
> Cheers,
> 
> On Aug 26, 2004, at 11:30 AM, Ben Sizer wrote:
> 
> > I've read through the draft and most of the messages on this list that
> > followed it. However, I have a basic problem with it which I will
> > attempt to summarise below.
> >
> > The focus seems to be on making frameworks more portable. The abstract
> > reads "This document specifies a proposed standard interface between
> > web servers and Python web applications or frameworks, to promote web
> > application portability across a variety of web servers." This is all
> > well and good, but the implications from that point onwards are that
> > we're firmly dealing with frameworks rather than applications. Phillip
> > J. Eby has commented on Ian Bicking's blog that "at this stage, the
> > benefits of WSGI are primarily for web *framework* authors, and web
> > *server* authors, not web *application* authors. This is *not* an
> > application API, it's a framework-to-server glue API."
> >
> > This immediately strikes me as odd, because from my previous
> > development experience frameworks are not that important. In fact, I'm
> > heavily inclined to believe that Python only has a proliferation of
> > frameworks because of the currently poor degree of higher level
> > support for web development in general, and the various frameworks
> > attempt to bridge that gap. Create better general web support for
> > Python, and frameworks will only be necessary for the really heavy
> > duty applications. Create the ability to make frameworks more
> > portable, and all you do is encourage more people to develop more
> > frameworks. Focusing on making life easier for framework developers is
> > solving the wrong problem, in my opinion.
> >
> > I come from an ASP and PHP background and generally speaking, a
> > developer doesn't want or need a framework between their code and the
> > web-scripting language when developing on those platforms. On the rare
> > occasions that you do use a framework (such as PHP-Nuke) it's because
> > you want to simplify high level activities like news management and
> > user lists, and allow people to add content without needing to know
> > HTML. By contrast Python's frameworks tend to address the trivial, low
> > level things that should fall under the 'batteries included'
> > philosophy that Python subscribes to.
> >
> > The front page of this Python Web-SIG suggests, "pick a Web framework
> > that already exists, make a functionality checklist from it, and add
> > that functionality to a new webserver module." I think that's what is
> > needed most of all - some sort of standard approach that new Python
> > programmers can jump right in and use, which doesn't require choosing
> > one of several different frameworks.
> >
> > What I'd like to see is something mirroring the Python Database API.
> > For instance, I might have to change "import MySQLdb" to "import
> > pyPgSQL" but I know that 99% of the rest of the database code will
> > work fine. As a web developer I would like to be able to change
> > "import cgi" to "import mod_python" or "import fastcgi" and know that,
> > if I follow a standard set of calls, I will have a simple and standard
> > way of producing a web document. The standardised access to the output
> > and input streams in the current draft is all well and good but
> > there's little point in me making use of that abstraction if I still
> > have to rely on extra modules for access to useful higher-level
> > concepts such as:
> >
> > - dispatching control flow based on the URI
> > - session management and cookies
> > - GET/query string parsing
> > - POST/form parsing
> > - ASP + PHP style templating
> >
> > If these things are coming soon in future WSGI drafts, then great! But
> > I got the impression that these features were being delegated out to
> > the legion of frameworks.
> >
> > I am aware that this all sounds very negative, and I don't mean to
> > criticise the hard work that Phillip and others have put into this
> > draft specification. I just worry that it diverts attention from what
> > I consider to be the real issue facing Python on the web, which is
> > making life easier for web application developers, not framework
> > developers.
> >
> > --
> > Ben Sizer
> > _______________________________________________
> > Web-SIG mailing list
> > Web-SIG@python.org
> > Web SIG: http://www.python.org/sigs/web-sig
> > Unsubscribe:
> > http://mail.python.org/mailman/options/web-sig/mnot%40mnot.net
> >
> 
> --
> Mark Nottingham     http://www.mnot.net/
> 
> ------------------------------
> 
> Message: 3
> Date: Thu, 26 Aug 2004 20:46:07 +0100
> From: Ben Sizer <brsizer@kylotan.eidosnet.co.uk>
> Subject: Re: [Web-SIG] Regarding the WSGI draft
> To: Mark Nottingham <mnot@mnot.net>
> Cc: web-sig@python.org
> Message-ID: <412E3DFF.2000605@kylotan.eidosnet.co.uk>
> Content-Type: text/plain; charset=us-ascii; format=flowed
> 
> Mark Nottingham wrote:
> 
> > I understand where you're coming from, but I think we're in a different
> > situation here. There are a lot of different ways
> > that you can construct an application framework; there is no "one true
> > way," because people have varying requirements for a Web application.
> 
> ....
> 
> > There has been some progress towards convergence on a common view of
> > what a Web application is, but I still think we have a ways to go, and
> > much to learn, before any one application framework can declare victory.
> 
> Although what you say makes sense on the surface, the fact remains that
> technologies such as ASP and PHP are popular and useful because they
> present a simple and standard interface to the user, whether that user
> is writing a 4 line script, a small application, or a large framework
> upon which to base other applications. With Python you seem stuck with
> two equally unappealing options: slow CGI if you want a simple script,
> where simple is relative since you need to fool around with os.environ,
> printing your own headers, etc - or a complex and idiosyncratic
> framework if you want anything non-trivial, but which is often just as
> complex as PHP straight out of the box, except with a much smaller user
> base and generally less documentation.
> 
> For example, you know that $_GET[varName] is going to be the standard
> way of accessing a querystring variable in PHP. Yet in Python it could
> be part of a request.form dictionary, or
> cgi.parse_qs(os.environ['QUERY_STRING']), or
> modpython.util.parse_qs(req.parsed_uri[7]), etc. Yet we know that query
> strings are part of the RFC2396 standard, so why not have a standard
> module or interface to present to the user?
> 
> I don't see any good reason for this sort of variance, except that
> there's a bias towards accommodating these existing frameworks rather
> than enabling simpler applications of the future, and which I think is a
> symptom of the problem rather than part of the solution.
> 
> --
> Ben Sizer.
> 
> ------------------------------
> 
> Message: 4
> Date: Thu, 26 Aug 2004 15:59:07 -0400
> From: "Phillip J. Eby" <pje@telecommunity.com>
> Subject: Re: [Web-SIG] Regarding the WSGI draft
> To: Ben Sizer <brsizer@kylotan.eidosnet.co.uk>, web-sig@python.org
> Message-ID: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>
> Content-Type: text/plain; charset="us-ascii"; format=flowed
> 
> At 07:30 PM 8/26/04 +0100, Ben Sizer wrote:
> >I just worry that it diverts attention from what I consider to be the real
> >issue facing Python on the web, which is making life easier for web
> >application developers, not framework developers.
> 
> Unfortunately, every effort to date to create a "framework to end all
> frameworks" has simply resulted in the existence of framework
> N+1.  Why?  Because the creation of a *new* framework means that there is
> no existing code that uses it.  And if the framework only provides features
> that others already have, there's no compelling reason to switch.
> 
> Any approach that ignores the economic reality of present-day Python web
> apps, and provides no way for them to migrate gradually to a new standard,
> is doomed to niche status at best.  (Comparison to ASP and PHP is
> misleading: both had standards for dispatching, sessions, cookies, form
> parsing, and templating *when they were created*, so there was no legacy
> codebase using alternative solutions that had to be migrated.)
> 
> And so, the only way we're going to "steal" the marketshare of existing
> frameworks is with the consent and co-operation of the developers of those
> frameworks.  That means there has to be enough benefit for them to justify
> the effort of getting on board.
> 
> So, please allow me to reveal my top-secret plan for total world
> domination...  :)
> 
> First, the current situation.  Choice of framework is a high investment for
> users, because once they choose, they are stuck with that framework and
> possibly server.  The cost to switch is extremely high.  It's almost as
> though every plumbing manufacturer makes their own sizes of pipes and
> connectors, so once you choose a vendor, you're stuck with them.
> 
> WSGI changes this scenario by introducing competitive pressure to the
> server/framework choice.  As soon as enough framework and server developers
> participate, the others are pushed by network effects to do the
> same.  Users ask, "Why can't I use your framework in any WSGI server?" and
> "Why can't I use any WSGI framework in your server?", pushing the slower
> adopters to either join up or be marginalized.
> 
> But this is just the first phase: standardizing on a size for one kind of
> pipe.  It's not very glamorous, but it fundamentally changes the
> marketplace, and causes many things to appear to spontaneously happen "on
> their own".
> 
> First, users can experiment with other frameworks, especially if those
> frameworks are lightweight.  This builds competitive pressure in the
> direction of lightweight, easy-to-integrate frameworks.  So framework
> developers begin to break their monolithic approaches down into smaller
> pieces that operate on segments of WSGI.  For example, a session service
> that you pass the incoming 'environ' and outgoing 'headers' to, in order
> for it to read and set cookies.  (Notice that this *isn't* a WSGI-defined
> or standardized service, just a service implemented *in terms of* WSGI.)
> 
> Such a service makes little sense to implement today, but people will
> spontaneously begin developing such services once WSGI is a ubiquitous part
> of the Python web development landscape.  It's the most natural thing in
> the world for them to do so, not only because it means a wider audience for
> their service, but because they're likely developing it for a WSGI-based
> environment they're already using.  What other platform would they write it
> for?
> 
> Because these services will be interchangeable to some degree, lock-in is
> limited and competition will determine a winner or winners.  Then, if the
> winners are sufficiently similar to allow useful standardization, that's
> the natural next step.  But, for some services, the differences will be
> important qualitative differences, and standardization would reduce
> meaningful choice.  We don't know in advance what these services should be,
> and we don't know enough to standardize on them now.
> 
> For someone with an ASP or PHP background, that last statement at least
> might sound like sheer lunacy.  But, Python web frameworks have often
> pioneered techniques years ahead of their appearance in ASP, PHP, and Java
> frameworks.  I would hate for us to lose our next great innovation to
> premature standardization.
> 
> But luckily, I don't need to worry: there's simply no way you'll get enough
> Python framework developers (and their users) to agree on such a
> standardization.  For one thing, it's not in their best interests to do
> so.  (Don't let me discourage you from trying, though, if that's what you
> want to do.  I just don't think you'll have much success, and am not
> interested in trying it myself.)
> 
> Anyway, there it is.  My secret plan to fundamentally alter the Python web
> programming universe through secret mind-control market manipulation and
> social engineering.  You found me out.  Now I'll have to kill you.*  :)
> 
> * "And I'd have gotten away with it too, if it hadn't been for those
> meddling kids..."
> 
> (Disclaimer for non-US readers: the above is a humorous reference to an
> American TV cartoon that featured a different character saying this line
> each week, after their nefarious plans were foiled.  It's not me calling
> anybody a meddling kid, or threatening to actually kill anyone!)
> 
> ------------------------------
> 
> Message: 5
> Date: Thu, 26 Aug 2004 16:11:38 -0400
> From: Bob Kimble <rjkimble@alum.mit.edu>
> Subject: Re: [Web-SIG] Regarding the WSGI draft
> To: web-sig@python.org
> Message-ID: <200408261611.38389.rjkimble@alum.mit.edu>
> Content-Type: text/plain;  charset="iso-8859-1"
> 
> On Thursday 26 August 2004 03:46 pm, Ben Sizer wrote:
> > Mark Nottingham wrote:
> > > I understand where you're coming from, but I think we're in a different
> > > situation here. There are a lot of different ways
> > > that you can construct an application framework; there is no "one true
> > > way," because people have varying requirements for a Web application.
> >
> > ...
> >
> > > There has been some progress towards convergence on a common view of
> > > what a Web application is, but I still think we have a ways to go, and
> > > much to learn, before any one application framework can declare victory.
> >
> > Although what you say makes sense on the surface, the fact remains that
> > technologies such as ASP and PHP are popular and useful because they
> > present a simple and standard interface to the user, whether that user
> > is writing a 4 line script, a small application, or a large framework
> > upon which to base other applications. With Python you seem stuck with
> > two equally unappealing options: slow CGI if you want a simple script,
> > where simple is relative since you need to fool around with os.environ,
> > printing your own headers, etc - or a complex and idiosyncratic
> > framework if you want anything non-trivial, but which is often just as
> > complex as PHP straight out of the box, except with a much smaller user
> > base and generally less documentation.
> >
> > For example, you know that $_GET[varName] is going to be the standard
> > way of accessing a querystring variable in PHP. Yet in Python it could
> > be part of a request.form dictionary, or
> > cgi.parse_qs(os.environ['QUERY_STRING']), or
> > modpython.util.parse_qs(req.parsed_uri[7]), etc. Yet we know that query
> > strings are part of the RFC2396 standard, so why not have a standard
> > module or interface to present to the user?
> >
> > I don't see any good reason for this sort of variance, except that
> > there's a bias towards accommodating these existing frameworks rather
> > than enabling simpler applications of the future, and which I think is a
> > symptom of the problem rather than part of the solution.
> 
> I have been reading this thread for a while now, and I haven't commented
> because I have done absolutely no web development using Python. However,
> Mark's comments strike me as being dead on. I'm used to the Java Servlet API,
> which creates an API for servlets and JSP pages. The fact that there are
> several high quality application servers that all support this API suggests
> to me that creating something similar for Python makes a lot of sense. I have
> written JSP's and servlets and run them under Tomcat, but I know that I could
> just as easily run them under WebSphere, WebLogic, JRun, or any others that
> support the API. It seems to me that creating a similar API for Python would
> be terrific. Of course, somebody would also have to write an application
> server to support the API, but I suspect some of the existing frameworks
> could be revamped to support it. Anyway, that's my 2 cents. I would love to
> see something similar to Tomcat and the Java Servlet API for Python.
> 
> ------------------------------
> 
> Message: 6
> Date: Thu, 26 Aug 2004 13:25:10 -0700
> From: Titus Brown <titus@caltech.edu>
> Subject: Re: [Web-SIG] Regarding the WSGI draft
> To: Bob Kimble <rjkimble@alum.mit.edu>
> Cc: web-sig@python.org
> Message-ID: <20040826202510.GA5704@caltech.edu>
> Content-Type: text/plain; charset=us-ascii
> 
> -> I have been reading this thread for a while now, and I haven't commented
> -> because I have done absolutely no web development using Python. However,
> -> Mark's comments strike me as being dead on. I'm used to the Java Servlet API,
> -> which creates an API for servlets and JSP pages. The fact that there are
> -> several high quality application servers that all support this API suggests
> -> to me that creating something similar for Python makes a lot of sense. I have
> -> written JSP's and servlets and run them under Tomcat, but I know that I could
> -> just as easily run them under WebSphere, WebLogic, JRun, or any others that
> -> support the API. It seems to me that creating a similar API for Python would
> -> be terrific. Of course, somebody would also have to write an application
> -> server to support the API, but I suspect some of the existing frameworks
> -> could be revamped to support it. Anyway, that's my 2 cents. I would love to
> -> see something similar to Tomcat and the Java Servlet API for Python.
> 
> <delurk>
> 
> I've implemented packages at the adapter level (PyWX), the framework
> level (crud that was never released because I found Quixote first), and
> the content level (based variously on CGI, WebWare, and Quixote).
> 
> I'm moderately skeptical of the short term use of the API being
> developed on this list, because in practice it is relatively easy
> to implement a framework that fits on top of all of the existing
> adapters (CGI, mod_python, etc.)  Medium term, I think it will lead
> to a welcome homogenization of server <--> adapter <--> framework
> interaction, and so I think it's a valuable concept.
> 
> The idea of having a single framework (like Java's "servlets") is, I
> think, silly.  Having implemented sites in several of the existing
> frameworks, it is clear that there are several different ways to
> conceptualize the development of Web sites: the Quixote style and
> the WebWare style are two very distinct examples.  Anything that cuts
> down on the variety of available frameworks is going to restrict the
> options, which is bad.
> 
> However, I think it is incumbent upon the developers and users of the
> different frameworks to clearly distinguish between the various options.
> Right now it is very confusing to me, and I've been developing Web sites
> in Python for 5 years ;).
> 
> I'm very confused as to why you need multiple servlet implementations in
> Java.  Wouldn't one do just as well as 10?  It sounds like having 5
> different implementations of the 'os' module in Python...
> 
> --titus
> 
> ------------------------------
> 
> _______________________________________________
> Web-SIG mailing list
> Web-SIG@python.org
> http://mail.python.org/mailman/listinfo/web-sig
> 
> End of Web-SIG Digest, Vol 10, Issue 26
> ***************************************
>
From jim-web-sig at jimdabell.com  Fri Aug 27 02:03:08 2004
From: jim-web-sig at jimdabell.com (Jim Dabell)
Date: Fri Aug 27 01:56:15 2004
Subject: [Web-SIG] Re: Web-SIG Digest, Vol 10, Issue 26
In-Reply-To: <6654eac404082616195fc15079@mail.gmail.com>
References: <20040826202307.7DCE81E400A@bag.python.org>
	<6654eac404082616195fc15079@mail.gmail.com>
Message-ID: <200408270103.08455.jim-web-sig@jimdabell.com>


[Please trim responses in future, you didn't have to quote the whole digest to 
us.]

> We should implement a Servlet-like interface which works atop WSGI,

Fair enough, but as it would sit on top WSGI, nobody need concern themselves 
with it until WSGI is finished.  Otherwise you are trying to hit a moving 
target.

The point of WSGI isn't that development stops after it's finished, but rather 
that everyone is on the same page before attempting something more ambitious 
like you describe.  Larger projects take more time to mature and give more 
scope for fundamental disgreements.  Something like WSGI can be specified, 
implemented and standardised relatively quickly, meaning there are 
incremental, measurable improvements, rather than everybody waiting around 
for the perfect system to be born.

Obviously your proposed servlet-like interface's requirements are a factor in 
what should go into WSGI, but I see no reason to believe your servlet-like 
interface would have significantly different requirements to all the other 
frameworks.


-- 
Jim Dabell

From ianb at colorstudy.com  Fri Aug 27 02:03:36 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Aug 27 02:03:42 2004
Subject: [Web-SIG] Other kinds of environment variables
In-Reply-To: <479BBC96-F7B3-11D8-82BE-000A95BD86C0@mnot.net>
References: <479BBC96-F7B3-11D8-82BE-000A95BD86C0@mnot.net>
Message-ID: <412E7A58.6030800@colorstudy.com>

Mark Nottingham wrote:
> One thing that seems to be missing in WSGI to me is the communication of 
> the delineation between what the server does and what the application does.
> 
> The latest drafts says;
> 
> [[[
> In general, the server or gateway is responsible for ensuring that 
> correct headers are sent to the client: if an application omits a needed 
> header, the server or gateway *shoud* add it. [...] If the application 
> supplies a header that the server would ordinarily supply, or that 
> contradicts the server's intended behaviour [...] the server or gateway 
> *may* discard the conflicting header, provided that its action is 
> recorded for the benefit of the application author.
> ]]]
> 
> I'm a bit uncomfortable with this, because there's no standard way for 
> the action to be "recorded for the benefit of the application author." 
> IMO this is one of the major problems with CGI.

The closest thing to a standard would be, I think, 
environ['wsgi.error'].  I would expect to see errors about the 
application to be sent there.  I also think it's reasonable not to 
specify it further than this -- many error logging facilities are 
possible, and it's all very server-specific.

> In other words, there's a laundry list of HTTP features that may or may 
> not be handled by the server on behalf of the application, depending on 
> how it's written and configured. Giving the application some idea of 
> what it can expect the server to do, and how it will do it, would help 
> application frameworks decide what tasks it needs to take on itself.

But then this is a different issue.  I think Phillip likes the idea of 
"configuration" for this.  I give it scare quotes because I think 
Phillip thinks about configuration somewhat differently than most 
people, and configuration plays a different sort of role in PEAK (and 
Zope 3).  It's a way of plugging pieces together, rather than just a way 
of indicating installation-specific values.

But, an earlier WSGI interface didn't have wsgi.threaded or 
wsgi.multiprocess, and I think it would actually be hard to work without 
these.

> For example;
> 
> * HTTP auth - does the server make the Authentication header available? 
> Automatically generate 401s when configured to require auth? If the 
> application framework wishes to perform auth on its own, will it have 
> the appropriate information available?

If the server does not provide the Authentication header, that would be 
useful to know.  Of course, sometimes you can't know that -- a CGI 
script doesn't know how its parent is configured.  Using Apache, you can 
configure it both ways for CGI scripts (and I think they even make this 
easier and more explicit in Apache 2, so you shouldn't just expect it to 
always be off).

But I can appreciate the annoyance when you don't know if HTTP auth will 
work, or you're new to this (or come from someplace like PHP) you just 
go nuts trying to figure out why the software won't let you log in.

> * chunked encoding - does the server chunk the body when appropriate?
> * content-length - does the server automatically calculate it?

These seem useful.

> * cache validation - does the server handle If-Modified-Since and 
> If-None-Match requests appropriately (e.g., with a 304)?

I would almost certainly expect this to be false.  There may be some 
WSGI servers that have an extended notion of the application, so they 
can look at things like the modification date.  But those are likely to 
be uncommon -- more likely only applications will know the necessary 
information.

> * content-encoding - does the server apply content-encoding in requests 
> and/or responses as appropriate, and what schemes does it support?
> * transfer-encoding - same as content-encoding

Again, seems useful.  What harm would there be if you assume they don't, 
or assume they do?  I haven't thought this part through.

When I think of middleware, I can think of many things like this.  In 
most cases, I'd add a key, and if the key wasn't present I'd know it was 
false.  But it can be odd.  Say I have a middleware that catches 
exceptions, because that's my one example at the moment.  If it is 
present, it would be nice if other applications didn't catch exceptions, 
and let them propagate all the way up.  So, the application looks for 
environ.get('ianb_middleware.exception_catcher')?  That's weird, because 
someone else comes along and makes their own exception catcher that 
works like mine; what key do they use?  It would be nice if we used the 
same key.

But then, at this point I might suggest we use 
'webapp0.exception_catcher', leading up to a Web App standard that 
defines the meaning for a bunch more keys ('webapp1.exception_catcher' 
once we agree on a standard).

Anyway, that's my theory on how this might go.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From jim-web-sig at jimdabell.com  Fri Aug 27 02:20:30 2004
From: jim-web-sig at jimdabell.com (Jim Dabell)
Date: Fri Aug 27 02:13:38 2004
Subject: [Web-SIG] Servers ignoring application-supplied headers
Message-ID: <200408270120.30569.jim-web-sig@jimdabell.com>


> In general, the server or gateway is responsible for ensuring that correct
> headers are sent to the client: if the application omits a needed header,
> the server or gateway *should* add it.  For example, the HTTP ``Date:`` and
> ``Server:`` headers would normally be supplied by the server or gateway.  If
> the application supplies a header that the server would ordinarily supply,
> or that contradicts the server's intended behavior (e.g. supplying a
> different ``Connection:`` header), the server or gateway *may* discard the
> conflicting header, provided that its action is recorded for the benefit of
> the application author.

Is this wise?  It's not really the WSGI's job to nanny the application and 
make sure it does the right thing.  I can see the case for supplying default 
values, but simply throwing away something it's specifically been asked to 
use seems rather shortsighted.  WSGI authors aren't perfect, and it's far to 
easy to end up in a situation where application developers are stuck behind a 
clueless WSGI that insists on ignoring certain things because it thinks it's 
the right thing to do.  It seems to me that if the application developers 
want to do something, WSGI shouldn't make it intentionally impossible for 
them to do.

The worst that is likely to happen is the application developer tries 
something and it breaks, so he doesn't try it again, right?


-- 
Jim Dabell

From mnot at mnot.net  Fri Aug 27 02:29:25 2004
From: mnot at mnot.net (Mark Nottingham)
Date: Fri Aug 27 02:29:29 2004
Subject: [Web-SIG] Servers ignoring application-supplied headers
In-Reply-To: <200408270120.30569.jim-web-sig@jimdabell.com>
References: <200408270120.30569.jim-web-sig@jimdabell.com>
Message-ID: <26460C34-F7C0-11D8-82BE-000A95BD86C0@mnot.net>

I assume that this part was written with CGI in mind. Not to say that 
we shouldn't do better than CGI when possible...

On Aug 26, 2004, at 5:20 PM, Jim Dabell wrote:

>> In general, the server or gateway is responsible for ensuring that 
>> correct
>> headers are sent to the client: if the application omits a needed 
>> header,
>> the server or gateway *should* add it.  For example, the HTTP 
>> ``Date:`` and
>> ``Server:`` headers would normally be supplied by the server or 
>> gateway.  If
>> the application supplies a header that the server would ordinarily 
>> supply,
>> or that contradicts the server's intended behavior (e.g. supplying a
>> different ``Connection:`` header), the server or gateway *may* 
>> discard the
>> conflicting header, provided that its action is recorded for the 
>> benefit of
>> the application author.
>
> Is this wise?  It's not really the WSGI's job to nanny the application 
> and
> make sure it does the right thing.  I can see the case for supplying 
> default
> values, but simply throwing away something it's specifically been 
> asked to
> use seems rather shortsighted.  WSGI authors aren't perfect, and it's 
> far to
> easy to end up in a situation where application developers are stuck 
> behind a
> clueless WSGI that insists on ignoring certain things because it 
> thinks it's
> the right thing to do.  It seems to me that if the application 
> developers
> want to do something, WSGI shouldn't make it intentionally impossible 
> for
> them to do.
>
> The worst that is likely to happen is the application developer tries
> something and it breaks, so he doesn't try it again, right?

--
Mark Nottingham     http://www.mnot.net/

From pje at telecommunity.com  Fri Aug 27 03:42:22 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Aug 27 03:42:00 2004
Subject: [Web-SIG] Other kinds of environment variables
In-Reply-To: <479BBC96-F7B3-11D8-82BE-000A95BD86C0@mnot.net>
Message-ID: <5.1.1.6.0.20040826212629.02641e00@mail.telecommunity.com>

At 03:57 PM 8/26/04 -0700, Mark Nottingham wrote:

>* HTTP auth - does the server make the Authentication header available? 
>Automatically generate 401s when configured to require auth? If the 
>application framework wishes to perform auth on its own, will it have the 
>appropriate information available?

This is already a problem today, I'm afraid.  For example, Apache 1.x 
doesn't normally supply this header to CGI applications at least.  (Which 
is really silly, IMO, because using REMOTE_USER instead can leads to 
serious security issues in shared hosting environments.)

Anyway, I think this is one that has to remain an unspecified 
deployment-specific issue.  No sane framework targeting multiple web 
servers is going to rely solely on HTTP basic-auth if it can avoid it 
anyway.  Basic-auth sucks on far too many levels.  I'm not saying that it 
doesn't have its niche, I'm just saying that I don't think we can make any 
guarantees about it in the WSGI spec without breaking something.

>* chunked encoding - does the server chunk the body when appropriate?
>* content-length - does the server automatically calculate it?

There's a section on both of these in the current draft, just not the last 
one I posted.  I sent a copy to peps@python.org, but haven't gotten a reply 
yet.

Here's the relevant section from the latest draft:

"""Handling the ``Content-Length`` Header
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If the application does not supply a ``Content-Length`` header, a
server or gateway may choose one of several approaches to handling
it.  The simplest of these is to close the client connection when
the response is completed.

Under some circumstances, however, the server or gateway may be
able to either generate a ``Content-Length`` header, or at least
avoid the need to close the client connection.  If the application
does *not* call the ``write()`` callable, and returns an iterable
whose ``len()`` is 1, then the server can automatically determine
``Content-Length`` by taking the length of the first string yielded
by the iterable.

And, if the server and client both support HTTP/1.1 "chunked
encoding" [3]_, then the server *may* use chunked encoding to send
a chunk for each ``write()`` call or string yielded by the iterable,
thus generating a ``Content-Length`` header for each chunk.  This
allows the server to keep the client connection alive, if it wishes
to do so.  Note that the server *must* comply fully with RFC 2616 when
doing this, or else fall back to one of the other strategies for
dealing with the absence of ``Content-Length``.
"""


>* cache validation - does the server handle If-Modified-Since and 
>If-None-Match requests appropriately (e.g., with a 304)?

IMO this is an application responsibility; given dynamic content, how can 
the server verify these?


>* content-encoding - does the server apply content-encoding in requests 
>and/or responses as appropriate, and what schemes does it support?
>* transfer-encoding - same as content-encoding

Do you have any suggestions?  My assumption is that the server should 
"first do no harm".  That is, the server shouldn't silently "value-add" 
encodings unless it's absolutely sure it's okay to do so, or a human has 
configured it to do so.


>I know that this can be addressed by server-specific environment now, but 
>I think there might be some low-hanging fruit for common functions like 
>the ones above. It might be that they'd be better in a separate document, 
>so they're not part of the 'core' WSGI, but I think there's real value in 
>having some common ones.

I think it certainly would be useful to have a comprehensive set of 
guidelines for how to use, provide, or apply HTTP/1.1 features in 
WSGI.  Judging from your input so far, I'd say you have a better handle on 
the subject than I do, so your contribution would be very welcome.  It may 
indeed make sense to create a separate PEP for them, since they will mainly 
be needed by server authors and by people who need to make use of some set 
of HTTP/1.1 features.

Other areas that need to be addressed within HTTP/1.1 probably also 
includes things like byte ranges.

From pje at telecommunity.com  Fri Aug 27 03:43:37 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Aug 27 03:43:14 2004
Subject: [Web-SIG] Re: expect/continue
In-Reply-To: <291B974B-F7B4-11D8-82BE-000A95BD86C0@mnot.net>
Message-ID: <5.1.1.6.0.20040826214248.02ed37b0@mail.telecommunity.com>

At 04:03 PM 8/26/04 -0700, Mark Nottingham wrote:
>So, I think this option should be removed. I can see some scenarios where 
>the server can and will be configured to reject all requests over a 
>certain size, etc. but rejecting all requests that use this mechanism 
>indiscriminately doesn't seem to fall into that case. If an implementation 
>doesn't want to deal with expect/continue at all, it has two choices;
>
>1) don't claim to be HTTP/1.1 conformant
>
>2) wait until the client decides you don't support expect/continue, and 
>sends the request body (this is suboptimal, for obvious reasons).

Sounds pretty good to me; why don't you just pull all the HTTP/1.1 stuff 
from WSGI and use it as a skeleton for starting your HTTP/1.1 guidelines 
document?  :)

From jim-web-sig at jimdabell.com  Fri Aug 27 04:17:02 2004
From: jim-web-sig at jimdabell.com (Jim Dabell)
Date: Fri Aug 27 04:10:36 2004
Subject: [Web-SIG] Other kinds of environment variables
In-Reply-To: <5.1.1.6.0.20040826212629.02641e00@mail.telecommunity.com>
References: <5.1.1.6.0.20040826212629.02641e00@mail.telecommunity.com>
Message-ID: <200408270317.02905.jim-web-sig@jimdabell.com>

On Friday 27 August 2004 02:42, Phillip J. Eby wrote:
> At 03:57 PM 8/26/04 -0700, Mark Nottingham wrote:
> >* cache validation - does the server handle If-Modified-Since and
> >If-None-Match requests appropriately (e.g., with a 304)?
>
> IMO this is an application responsibility; given dynamic content, how can
> the server verify these?

In my opinion this is a middleware responsibility.  Look at the headers 
supplied by the client, put off beginning the response until all the response 
headers are retrieved from the application, and respond with a 304 where 
appropriate.

Are there any situations you can think of where an application would want to 
generate a matching Last-Modified or ETag header but not generate a 304?  If 
that happens, what stops an intermediate proxy from throwing the response 
body away and responding with a 304 itself?


> >* content-encoding - does the server apply content-encoding in requests
> >and/or responses as appropriate, and what schemes does it support?
> >* transfer-encoding - same as content-encoding
>
> Do you have any suggestions?  My assumption is that the server should
> "first do no harm".  That is, the server shouldn't silently "value-add"
> encodings unless it's absolutely sure it's okay to do so, or a human has
> configured it to do so.

I think the constraints RFC 2616 puts on HTTP proxies should apply to 
servers/middleware because that's essentially what they are.  Basically, any 
transformation can occur as long as the server/middleware understands the 
relevant parts of the protocol, even to the point of transforming from one 
media type to another (as long as cache-control: no-transform isn't 
encountered, of course).

If a downstream proxy can make comprehensive changes to the message without 
any authorisation beyond sitting between the two parties, I think 
servers/middleware should be at least as free.


> >I know that this can be addressed by server-specific environment now, but
> >I think there might be some low-hanging fruit for common functions like
> >the ones above. It might be that they'd be better in a separate document,
> >so they're not part of the 'core' WSGI, but I think there's real value in
> >having some common ones.
>
> I think it certainly would be useful to have a comprehensive set of
> guidelines for how to use, provide, or apply HTTP/1.1 features in
> WSGI.

I agree that having this in a separate document is the best approach, but I 
don't think that it's something specific to WSGI.  Last time I checked, the 
Atom guys also felt the need to have a best practices document in relation to 
HTTP usage, so perhaps a collaboration is in order?  I seem to remember a 
"common pitfalls" type document from the W3C from a few years ago as well, 
but I've failed to dig anything up so far.


-- 
Jim Dabell

From mnot at mnot.net  Fri Aug 27 05:44:38 2004
From: mnot at mnot.net (Mark Nottingham)
Date: Fri Aug 27 05:44:42 2004
Subject: [Web-SIG] Other kinds of environment variables
In-Reply-To: <5.1.1.6.0.20040826212629.02641e00@mail.telecommunity.com>
References: <5.1.1.6.0.20040826212629.02641e00@mail.telecommunity.com>
Message-ID: <6BBA3664-F7DB-11D8-82BE-000A95BD86C0@mnot.net>


On Aug 26, 2004, at 6:42 PM, Phillip J. Eby wrote:
>> * HTTP auth - does the server make the Authentication header 
>> available? Automatically generate 401s when configured to require 
>> auth? If the application framework wishes to perform auth on its own, 
>> will it have the appropriate information available?
>
> This is already a problem today, I'm afraid.  For example, Apache 1.x 
> doesn't normally supply this header to CGI applications at least.  
> (Which is really silly, IMO, because using REMOTE_USER instead can 
> leads to serious security issues in shared hosting environments.)
>
> Anyway, I think this is one that has to remain an unspecified 
> deployment-specific issue.  No sane framework targeting multiple web 
> servers is going to rely solely on HTTP basic-auth if it can avoid it 
> anyway.  Basic-auth sucks on far too many levels.  I'm not saying that 
> it doesn't have its niche, I'm just saying that I don't think we can 
> make any guarantees about it in the WSGI spec without breaking 
> something.

Digest auth sucks much less, and also uses REMOTE_USER.


>> * chunked encoding - does the server chunk the body when appropriate?
>> * content-length - does the server automatically calculate it?
>
> There's a section on both of these in the current draft, just not the 
> last one I posted.  I sent a copy to peps@python.org, but haven't 
> gotten a reply yet.
>
> Here's the relevant section from the latest draft:
>
> """Handling the ``Content-Length`` Header
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> If the application does not supply a ``Content-Length`` header, a
> server or gateway may choose one of several approaches to handling
> it.  The simplest of these is to close the client connection when
> the response is completed.
>
> Under some circumstances, however, the server or gateway may be
> able to either generate a ``Content-Length`` header, or at least
> avoid the need to close the client connection.  If the application
> does *not* call the ``write()`` callable, and returns an iterable
> whose ``len()`` is 1, then the server can automatically determine
> ``Content-Length`` by taking the length of the first string yielded
> by the iterable.
>
> And, if the server and client both support HTTP/1.1 "chunked
> encoding" [3]_, then the server *may* use chunked encoding to send
> a chunk for each ``write()`` call or string yielded by the iterable,
> thus generating a ``Content-Length`` header for each chunk.  This
> allows the server to keep the client connection alive, if it wishes
> to do so.  Note that the server *must* comply fully with RFC 2616 when
> doing this, or else fall back to one of the other strategies for
> dealing with the absence of ``Content-Length``.
> """

Looks good.


>> I know that this can be addressed by server-specific environment now, 
>> but I think there might be some low-hanging fruit for common 
>> functions like the ones above. It might be that they'd be better in a 
>> separate document, so they're not part of the 'core' WSGI, but I 
>> think there's real value in having some common ones.
>
> I think it certainly would be useful to have a comprehensive set of 
> guidelines for how to use, provide, or apply HTTP/1.1 features in 
> WSGI.  Judging from your input so far, I'd say you have a better 
> handle on the subject than I do, so your contribution would be very 
> welcome.  It may indeed make sense to create a separate PEP for them, 
> since they will mainly be needed by server authors and by people who 
> need to make use of some set of HTTP/1.1 features.

OK, I'll take that as a challenge :)  I agree that it doesn't make 
sense to put this onto the critical path for WSGI getting into a PEP.


> Other areas that need to be addressed within HTTP/1.1 probably also 
> includes things like byte ranges.

Ah, yes.


--
Mark Nottingham     http://www.mnot.net/

From mnot at mnot.net  Fri Aug 27 05:44:45 2004
From: mnot at mnot.net (Mark Nottingham)
Date: Fri Aug 27 05:44:54 2004
Subject: [Web-SIG] Other kinds of environment variables
In-Reply-To: <412E7A58.6030800@colorstudy.com>
References: <479BBC96-F7B3-11D8-82BE-000A95BD86C0@mnot.net>
	<412E7A58.6030800@colorstudy.com>
Message-ID: <6FF369B1-F7DB-11D8-82BE-000A95BD86C0@mnot.net>


On Aug 26, 2004, at 5:03 PM, Ian Bicking wrote:

>> * cache validation - does the server handle If-Modified-Since and 
>> If-None-Match requests appropriately (e.g., with a 304)?
>
> I would almost certainly expect this to be false.  There may be some 
> WSGI servers that have an extended notion of the application, so they 
> can look at things like the modification date.  But those are likely 
> to be uncommon -- more likely only applications will know the 
> necessary information.

Apache CGI does it; i.e., if you set a Last-Modified header, it'll 
automagically handle validation for you.

This is pretty old, but gives an indication of what Web servers do in 
this and other regards:
   http://www.mnot.net/papers/capabilities.pdf


> When I think of middleware, I can think of many things like this.  In 
> most cases, I'd add a key, and if the key wasn't present I'd know it 
> was false.  But it can be odd.  Say I have a middleware that catches 
> exceptions, because that's my one example at the moment.  If it is 
> present, it would be nice if other applications didn't catch 
> exceptions, and let them propagate all the way up.  So, the 
> application looks for 
> environ.get('ianb_middleware.exception_catcher')?  That's weird, 
> because someone else comes along and makes their own exception catcher 
> that works like mine; what key do they use?  It would be nice if we 
> used the same key.
>
> But then, at this point I might suggest we use 
> 'webapp0.exception_catcher', leading up to a Web App standard that 
> defines the meaning for a bunch more keys ('webapp1.exception_catcher' 
> once we agree on a standard).
>
> Anyway, that's my theory on how this might go.

I can totally see this stuff happening on a more ad hoc basis. We did 
similar things at Akamai; i.e., putting together a 
dynamically-configured pipeline of handlers to implement HTTP 
functionality, as well as content transforms. Very useful and very 
cool.

--
Mark Nottingham     http://www.mnot.net/

From pje at telecommunity.com  Fri Aug 27 06:02:45 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Aug 27 06:02:24 2004
Subject: [Web-SIG] Other kinds of environment variables
In-Reply-To: <6FF369B1-F7DB-11D8-82BE-000A95BD86C0@mnot.net>
References: <412E7A58.6030800@colorstudy.com>
	<479BBC96-F7B3-11D8-82BE-000A95BD86C0@mnot.net>
	<412E7A58.6030800@colorstudy.com>
Message-ID: <5.1.1.6.0.20040826235336.023b1ab0@mail.telecommunity.com>

At 08:44 PM 8/26/04 -0700, Mark Nottingham wrote:

>On Aug 26, 2004, at 5:03 PM, Ian Bicking wrote:
>
>>>* cache validation - does the server handle If-Modified-Since and 
>>>If-None-Match requests appropriately (e.g., with a 304)?
>>
>>I would almost certainly expect this to be false.  There may be some WSGI 
>>servers that have an extended notion of the application, so they can look 
>>at things like the modification date.  But those are likely to be 
>>uncommon -- more likely only applications will know the necessary information.
>
>Apache CGI does it; i.e., if you set a Last-Modified header, it'll 
>automagically handle validation for you.

I guess the relevance of this depends on whether bandwidth or CPU is the 
scarcer resource.  If you want to save CPU, the application should do this, 
so it doesn't have to produce a response body it doesn't need.  If all you 
care about is bandwidth, then certainly the server can truncate the body.

I'm inclined to make this guideline permissive: a server *may* treat 
write() as a no-op and change the status if it can do so safely.  But I 
don't think servers should be required to do this.


[Ian:]
>>When I think of middleware, I can think of many things like this.  In 
>>most cases, I'd add a key, and if the key wasn't present I'd know it was 
>>false.  But it can be odd.  Say I have a middleware that catches 
>>exceptions, because that's my one example at the moment.  If it is 
>>present, it would be nice if other applications didn't catch exceptions, 
>>and let them propagate all the way up.  So, the application looks for 
>>environ.get('ianb_middleware.exception_catcher')?  That's weird, because 
>>someone else comes along and makes their own exception catcher that works 
>>like mine; what key do they use?  It would be nice if we used the same key.

I'm somewhat negative on this concept; to me an application should be 
responsible for catching its own exceptions, or require a middleware 
wrapping for it.  The server/gateway *has* to be responsible for catching 
any otherwise uncaught exceptions.  I don't really get the concept of 
wanting to *not* catch exceptions.  If you have a two-layer model 
(app+exception catcher), just put the handler you want to use in place as 
middleware.  If the app has its own exception handling, surely it knows 
better how to handle the exception than anything else, so why change?


[Mark:]
>I can totally see this stuff happening on a more ad hoc basis. We did 
>similar things at Akamai; i.e., putting together a dynamically-configured 
>pipeline of handlers to implement HTTP functionality, as well as content 
>transforms. Very useful and very cool.

Ah, now I see why you know so much about HTTP/1.1 issues "in the field".  :)

From pje at telecommunity.com  Fri Aug 27 06:07:14 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Aug 27 06:06:49 2004
Subject: [Web-SIG] Servers ignoring application-supplied headers
In-Reply-To: <200408270120.30569.jim-web-sig@jimdabell.com>
Message-ID: <5.1.1.6.0.20040827000412.02399e40@mail.telecommunity.com>

At 01:20 AM 8/27/04 +0100, Jim Dabell wrote:

> > In general, the server or gateway is responsible for ensuring that correct
> > headers are sent to the client: if the application omits a needed header,
> > the server or gateway *should* add it.  For example, the HTTP ``Date:`` and
> > ``Server:`` headers would normally be supplied by the server or 
> gateway.  If
> > the application supplies a header that the server would ordinarily supply,
> > or that contradicts the server's intended behavior (e.g. supplying a
> > different ``Connection:`` header), the server or gateway *may* discard the
> > conflicting header, provided that its action is recorded for the benefit of
> > the application author.
>
>Is this wise?  It's not really the WSGI's job to nanny the application and
>make sure it does the right thing.  I can see the case for supplying default
>values, but simply throwing away something it's specifically been asked to
>use seems rather shortsighted.  WSGI authors aren't perfect, and it's far to
>easy to end up in a situation where application developers are stuck behind a
>clueless WSGI that insists on ignoring certain things because it thinks it's
>the right thing to do.  It seems to me that if the application developers
>want to do something, WSGI shouldn't make it intentionally impossible for
>them to do.
>
>The worst that is likely to happen is the application developer tries
>something and it breaks, so he doesn't try it again, right?

Fair enough.  I should probably narrow that phrasing more specifically to 
the issue I had in mind.  Specifically, it shouldn't be the application's 
job to control whether the connection will persist or not.  That's 
something that (IMO) belongs squarely in the server/gateway's bailiwick.  I 
guess I was just trying to get away without studying the 
keep-alive/connection header specs enough to be more specific.  :)

It may be that there are other response headers that similarly should be 
the exclusive preserve of the server, but I don't know what they are at 
present.

From pje at telecommunity.com  Fri Aug 27 06:11:49 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Aug 27 06:11:24 2004
Subject: [Web-SIG] Other kinds of environment variables
In-Reply-To: <6BBA3664-F7DB-11D8-82BE-000A95BD86C0@mnot.net>
References: <5.1.1.6.0.20040826212629.02641e00@mail.telecommunity.com>
	<5.1.1.6.0.20040826212629.02641e00@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040827000752.0239b2b0@mail.telecommunity.com>

At 08:44 PM 8/26/04 -0700, Mark Nottingham wrote:
>Digest auth sucks much less, and also uses REMOTE_USER.

As I said, REMOTE_USER in a CGI environment leads to nasty local-system 
security holes: potentially a local user can just set 
REMOTE_USER=whoeverIwantToBe and invoke the application.

Maybe we should, however, have a configuration key for 
'wsgi.auth_available' that indicates the availability of the 
HTTP_AUTHORIZATION header.  Absence of 'wsgi.auth_available' would mean 
that the availability is unknown, while true or false would indicate 
definite availability or lack thereof.

From ianb at colorstudy.com  Fri Aug 27 06:11:28 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Aug 27 06:11:34 2004
Subject: [Web-SIG] Other kinds of environment variables
In-Reply-To: <5.1.1.6.0.20040826235336.023b1ab0@mail.telecommunity.com>
References: <412E7A58.6030800@colorstudy.com>
	<479BBC96-F7B3-11D8-82BE-000A95BD86C0@mnot.net>
	<412E7A58.6030800@colorstudy.com>
	<5.1.1.6.0.20040826235336.023b1ab0@mail.telecommunity.com>
Message-ID: <412EB470.7010407@colorstudy.com>

Phillip J. Eby wrote:
>>> When I think of middleware, I can think of many things like this.  In 
>>> most cases, I'd add a key, and if the key wasn't present I'd know it 
>>> was false.  But it can be odd.  Say I have a middleware that catches 
>>> exceptions, because that's my one example at the moment.  If it is 
>>> present, it would be nice if other applications didn't catch 
>>> exceptions, and let them propagate all the way up.  So, the 
>>> application looks for 
>>> environ.get('ianb_middleware.exception_catcher')?  That's weird, 
>>> because someone else comes along and makes their own exception 
>>> catcher that works like mine; what key do they use?  It would be nice 
>>> if we used the same key.
> 
> 
> I'm somewhat negative on this concept; to me an application should be 
> responsible for catching its own exceptions, or require a middleware 
> wrapping for it.  The server/gateway *has* to be responsible for 
> catching any otherwise uncaught exceptions.  I don't really get the 
> concept of wanting to *not* catch exceptions.  If you have a two-layer 
> model (app+exception catcher), just put the handler you want to use in 
> place as middleware.  If the app has its own exception handling, surely 
> it knows better how to handle the exception than anything else, so why 
> change?

Generally the app doesn't know how to best handle unexpected exceptions. 
  There's no "right" way to handle unexpected exceptions, because they 
are unexpected.  Handing unexpected exceptions is usually 
installation-specific.  Imagining a heterogeneous setup with multiple 
applications, it would be annoying to configure each application when 
you could group them, and to deal with some applications having poor 
support for debugging vs. others.  E.g., a good exception catcher will 
log lots of information for post-mortem debugging, notify the 
appropriate person, etc.  A poor exception catcher just prints out the 
traceback for the user.  Blech.

Certainly this could also be done as a library.  Maybe that's better, 
but I still like the idea of centralizing it.  I don't think it's so bad.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From ianb at colorstudy.com  Fri Aug 27 06:34:17 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Aug 27 06:34:22 2004
Subject: [Web-SIG] Other kinds of environment variables
In-Reply-To: <6FF369B1-F7DB-11D8-82BE-000A95BD86C0@mnot.net>
References: <479BBC96-F7B3-11D8-82BE-000A95BD86C0@mnot.net>
	<412E7A58.6030800@colorstudy.com>
	<6FF369B1-F7DB-11D8-82BE-000A95BD86C0@mnot.net>
Message-ID: <412EB9C9.5030103@colorstudy.com>

Mark Nottingham wrote:
> On Aug 26, 2004, at 5:03 PM, Ian Bicking wrote:
> 
>>> * cache validation - does the server handle If-Modified-Since and 
>>> If-None-Match requests appropriately (e.g., with a 304)?
>>
>>
>> I would almost certainly expect this to be false.  There may be some 
>> WSGI servers that have an extended notion of the application, so they 
>> can look at things like the modification date.  But those are likely 
>> to be uncommon -- more likely only applications will know the 
>> necessary information.
> 
> 
> Apache CGI does it; i.e., if you set a Last-Modified header, it'll 
> automagically handle validation for you.

That seems... well, Apache is putting forward effort, but obviously it's 
not a terribly efficient way to go about it.  I think it would be fine 
if a server did that when it could, but I wouldn't leave it up to the 
server if the application was able to handle it on its own.  So it's not 
particularly important for the application to know if the server is 
going to do this, as it wouldn't change what the application does.  (So 
long as the application is giving an accurate Last-Modified header, 
which I think we should expect.)

But this made me think, the WSGI spec leaves lots of ways for the server 
to add extensions to the request, but not many ways to extend the 
application.  Presumably if you wanted the server to be able to handle 
this, the server would have to be able to query the application in some way.

In this case it would be sufficient to have the server implicitly query 
the application by looking at the headers it produces.  From there, it 
would want to abort the application (though it might be reasonable for 
the application to refuse to be aborted).  There's no allowance for 
this, nor can I think of an extension to allow it.

This is where application starts to blend into resource, which isn't the 
way WSGI looks at things (reasonably, since resources are much more 
complex and full of structure compared to applications).  As I think 
about it, I'm kind of talking myself out of the whole thing... but there 
are places where middleware could make good use by looking ahead into 
the application, but I don't think WSGI could be extended in that direction.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From paul.boddie at ementor.no  Fri Aug 27 12:32:58 2004
From: paul.boddie at ementor.no (Paul Boddie)
Date: Fri Aug 27 12:33:10 2004
Subject: [Web-SIG] Regarding the WSGI draft
Message-ID: <89DE0F3E9781C048A14DC88C06D9F93D13EF30@100nooslmsg005.common.alpharoot.net>

Bob Kimble wrote:
->
-> I have been reading this thread for a while now, and I haven't
commented 
-> because I have done absolutely no web development using Python.
However, 
-> Mark's comments strike me as being dead on. I'm used to the Java
Servlet
-> API, which creates an API for servlets and JSP pages. The fact that
there
-> are several high quality application servers that all support this
API
-> suggests to me that creating something similar for Python makes a lot
of
-> sense. I have written JSP's and servlets and run them under Tomcat,
but I
-> know that I could just as easily run them under WebSphere, WebLogic,
-> JRun, or any others that support the API.

Once the deployment gymnastics and library conflicts are dealt with,
yes.
;-) It's an interesting point that I'll hint at briefly below that it
isn't
exactly coincidence that the most popular Java frameworks are all based
on
the Servlet API in some way.

-> It seems to me that creating a similar API for Python would 
-> be terrific. Of course, somebody would also have to write an
application 
-> server to support the API, but I suspect some of the existing
frameworks 
-> could be revamped to support it. Anyway, that's my 2 cents. I would
love
-> to see something similar to Tomcat and the Java Servlet API for
Python.

Well, Webware was created with the Java Servlet API in mind, amongst
other
inspirations, and there are certainly plenty of frameworks which follow
the
same pattern. However, having looked into implementing the high-level
functionality that Mark Nottingham and Philip Eby are presumably
referring
to, and having looked into the differences between frameworks before
(which
led to the increasingly incoherent WebProgramming Wiki page), any future
work of mine in that area will be done on top of WebStack:

http://www.python.org/pypi?%3Aaction=search&name=WebStack

Clearly, by even mentioning it I'm pushing some kind of agenda, but
should I
want to develop some kind of Web application or framework, I'd rather
have a
reasonably sane API which works across the major technologies (and does
so
pretty well right now).

Titus Brown wrote:
>
> I'm moderately skeptical of the short term use of the API being
> developed on this list, because in practice it is relatively easy
> to implement a framework that fits on top of all of the existing
> adapters (CGI, mod_python, etc.)  Medium term, I think it will lead
> to a welcome homogenization of server <--> adapter <--> framework
> interaction, and so I think it's a valuable concept.

I think it depends how many frameworks you want to support and which
ones
you choose. The work may be intellectually straightforward, but it isn't
necessarily trivial. As for the value of the WSGI concept, if it
provides a
better foundation for higher-level frameworks and applications, then
it's
obviously a good thing. I'm not totally convinced that lots of people
might
want to run Webware on top of Twisted, for example, and that the Twisted
people will get excited by this very notion and do the work to make it
happen. (Although having now said that, they might rise to the
challenge.)
Moreover, when it comes to "co-locating" applications, there exists some
pretty adequate solutions for doing so right now through Apache and
other
generic Web server solutions.

> The idea of having a single framework (like Java's "servlets") is, I
> think, silly.  Having implemented sites in several of the existing
> frameworks, it is clear that there are several different ways to
> conceptualize the development of Web sites: the Quixote style and
> the WebWare style are two very distinct examples.  Anything that cuts
> down on the variety of available frameworks is going to restrict the
> options, which is bad.

There are a variety of Java frameworks which are based on the Servlet
API
and which offer a range of fairly diverse development styles. Few people
really want to code applications directly against that API, but it's
misleading (if not wrong) to state that a standard API at such a low
level
will somehow strongly constrain how you develop your applications.

As for the diversity of styles within Python Web frameworks, one has to
ask
whether such a standard API is useful and can support things like
Quixote,
SkunkWeb, Webware and Zope. If you split this analysis into the
dispatching
of requests and the request objects themselves, the area of "sensible"
diversity is the dispatching mechanism - where Python frameworks differ
in
their treatment of request and response information tends to be in how
comprehensive and consistent (or the opposite) the APIs for such
concepts
are. Where dispatching is concerned, I dislike the way many Python
frameworks decide on one's behalf how the URLs are going to be
interpreted,
and I welcome things like Ian Bicking's enhancements to Webware that
have
removed such inconveniences since 0.8.1; much diversity is arguably
arbitrary in this area, anyway.

> However, I think it is incumbent upon the developers and users of the
> different frameworks to clearly distinguish between the various
options.
> Right now it is very confusing to me, and I've been developing Web
sites
> in Python for 5 years ;).

The problem is that the average developer has to choose something to
start
out on, and having looked at most of the main frameworks I can say with
confidence that the average developer has to make a pretty big
compromise
between things like API sanity, API popularity and deployment
flexibility.
WSGI does its thing to make deployment less of an issue (along somewhat
with
API popularity), but avoids the burning issue of the standard API that
many
people within the Python frameworks scene insist isn't necessary whilst
also
hinting that it could be a good thing. Certainly, I'd regard a
discussion on
the need for such a thing as significantly more important than the
decorator
discussion at the very least.

Paul
From brsizer at kylotan.eidosnet.co.uk  Fri Aug 27 14:56:17 2004
From: brsizer at kylotan.eidosnet.co.uk (Ben Sizer)
Date: Fri Aug 27 14:55:00 2004
Subject: [Web-SIG] Regarding the WSGI draft
In-Reply-To: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>
References: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>
Message-ID: <412F2F71.4040608@kylotan.eidosnet.co.uk>

Phillip J. Eby wrote:
> Unfortunately, every effort to date to create a "framework to end all 
> frameworks" has simply resulted in the existence of framework N+1.  
> Why?  Because the creation of a *new* framework means that there is no 
> existing code that uses it.  And if the framework only provides features 
> that others already have, there's no compelling reason to switch.

To quote http://www.python.org/sigs/web-sig/ again, "pick a Web
framework that already exists". To pick an example from my minimal
experience here, mod_python might make a good baseline because it's
reasonably low-level already and runs on Unix and Win32, yet provides
templating, dispatching, session handling, etc. Maybe if someone could
replicate a large subset of mod_python's functionality on IIS then you'd
have something very useful.

> Any approach that ignores the economic reality of present-day Python web 
> apps, and provides no way for them to migrate gradually to a new 
> standard, is doomed to niche status at best.

I see your point but I look at this from the other side; any approach
that is focused on the current niche status of Python web apps, is
doomed to perpetuate that niche status. No new framework or API or
'standard' Python web service is going to break existing code, just
provide an alternative. Why therefore is there such a focus on
accommodating existing users and having them migrate over? This sounds
too much like preaching to the converted to me.

> And so, the only way we're going to "steal" the marketshare of
> existing frameworks is with the consent and co-operation of the
> developers of those frameworks.

I don't think the idea is to steal the marketshare of existing
frameworks, as you put it. Rather, I'd think it would be about apturing
the imagination of the average developer who would appreciate Python as
a language. Some people won't use ASP because of the Microsoft aspect,
or won't use PHP because of the Perl/C syntax. These are people who
would probably be very interested in using an open language such as
Python for this sort of thing.

> First, the current situation.  Choice of framework is a high investment 
> for users, because once they choose, they are stuck with that framework 
> and possibly server.

This is why I would like Python to have web support in the standard
library that is on a high enough level that you don't necessarily need a
framework to achieve something useful.

I readily agree that something such as WSGI would nicely form the
backbone of the interchangeable modules. All I disagree with that you
should then /need/ one of these competing frameworks on top of that
before you can do anything useful. Hence my worry about the insistence
on such frameworks. As it stands, web development is pretty much the
only commonplace task that I can't achieve with Python using either the
standard library or an obvious 3rd party package.

> Because these services will be interchangeable to some degree, lock-in 
> is limited and competition will determine a winner or winners.  Then, if 
> the winners are sufficiently similar to allow useful standardization, 
> that's the natural next step.  But, for some services, the differences 
> will be important qualitative differences, and standardization would 
> reduce meaningful choice.  We don't know in advance what these services 
> should be, and we don't know enough to standardize on them now.
> 
> For someone with an ASP or PHP background, that last statement at least 
> might sound like sheer lunacy.

It sounds more like refusing to sell ready meals in stores because of
the insistence that everybody likes their food cooked in different ways.
If we provided a simpler and effective baseline, even if that standard
only featured 90% of the power and flexibility of existing services,
then I expect we'd see a rapid take-up of that technology.

In no way do I think that the current services and frameworks are
useless. I just worry that there's this focus on what I'd vaguely call
'enterprise' level web development that shuts out the majority of
developers who are trying to do something simpler.

-- 
Ben Sizer.


From brsizer at kylotan.eidosnet.co.uk  Fri Aug 27 15:18:57 2004
From: brsizer at kylotan.eidosnet.co.uk (Ben Sizer)
Date: Fri Aug 27 15:17:19 2004
Subject: [Web-SIG] Regarding the WSGI draft
In-Reply-To: <89DE0F3E9781C048A14DC88C06D9F93D13EF30@100nooslmsg005.common.alpharoot.net>
References: <89DE0F3E9781C048A14DC88C06D9F93D13EF30@100nooslmsg005.common.alpharoot.net>
Message-ID: <412F34C1.7020704@kylotan.eidosnet.co.uk>

Paul Boddie wrote:

> It's an interesting point that I'll hint at briefly below that
 > it isn't exactly coincidence that the most popular
 > Java frameworks are all based on the Servlet API in some way.

I'd certainly argue that it's no coincidence that the Servlet API - from 
what I can see - has built in support for sessions, handling form data, 
query strings, etc.

What worries me about the talk on this list is that people are aspiring 
to give Python web development all the complexity of the Java 
methodology with almost none of the convenience!

Personally I think that Java Servlets are still too low-level for your 
average web developer, so to see the implication that they're too 
high-level and therefore somehow limiting framework diversity is worrying.

-- 
Ben Sizer

From pje at telecommunity.com  Fri Aug 27 15:40:15 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Aug 27 15:39:55 2004
Subject: [Web-SIG] Regarding the WSGI draft
In-Reply-To: <412F2F71.4040608@kylotan.eidosnet.co.uk>
References: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>
	<5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040827090419.025ab160@mail.telecommunity.com>

At 01:56 PM 8/27/04 +0100, Ben Sizer wrote:
>Phillip J. Eby wrote:
>>Any approach that ignores the economic reality of present-day Python web 
>>apps, and provides no way for them to migrate gradually to a new 
>>standard, is doomed to niche status at best.
>
>I see your point but I look at this from the other side; any approach
>that is focused on the current niche status of Python web apps,

Huh?  I meant niche status *within* the Python community.  My point is that 
trying to promote another framework isn't going to get much past the noise 
level in communicating about *current* frameworks.


>No new framework or API or
>'standard' Python web service is going to break existing code, just
>provide an alternative. Why therefore is there such a focus on
>accommodating existing users and having them migrate over?

Because if providing an alternative would actually change anything, then 
things should have already changed by now.  Simply providing a new 
framework will not create any technical or social network effects, so that 
leaves marketing as the  force to drive adoption.  And that marketing will 
be limited to either 1) new users, or 2) people you can get to switch from 
existing frameworks.


>>And so, the only way we're going to "steal" the marketshare of
>>existing frameworks is with the consent and co-operation of the
>>developers of those frameworks.
>
>I don't think the idea is to steal the marketshare of existing
>frameworks, as you put it. Rather, I'd think it would be about apturing
>the imagination of the average developer who would appreciate Python as
>a language

Okay, so you want to recruit non-Python developers; fine.  Go for 
it.  That's entirely orthogonal to what I'm trying to do with WSGI.  But, I 
think you'll have an easier time of it once WSGI is ubiquitous and APIs 
emerge that you can then use as a standard.


>>First, the current situation.  Choice of framework is a high investment 
>>for users, because once they choose, they are stuck with that framework 
>>and possibly server.
>
>This is why I would like Python to have web support in the standard
>library that is on a high enough level that you don't necessarily need a
>framework to achieve something useful.

As a practical matter, you'll need commuity support to get something like 
that in the standard library, and the political reality of the community is 
that you'll have to show why accepting your new framework N+1 doesn't mean 
that frameworks 1 through N should also be included.


>I readily agree that something such as WSGI would nicely form the
>backbone of the interchangeable modules. All I disagree with that you
>should then /need/ one of these competing frameworks on top of that
>before you can do anything useful.

It's not that you should *need* one of them; it's merely that if adding 
such features gets in the way of WSGI goals, then we don't add such 
features.  And adding those features gets in the way of the goals if it 
adds unwarranted complexity to servers, gateways, or middleware.

Therefore, it may make sense for those wishing to devise a higher-level or 
"friendlier" API, to build it atop WSGI in parallel with the current 
standardization efforts, and then propose it as a stdlib addition once WSGI 
is stable and has seen some adoption, perhaps as part of an effort to 
upgrade the stdlib-included web servers and gateways to support WSGI.

Such efforts would be easier to spin as "tools for WSGI" rather than "web 
framework N+1".


>It sounds more like refusing to sell ready meals in stores because of
>the insistence that everybody likes their food cooked in different ways.

Not at all; we have dozens of lines of ready meals with names like 
Albatross, CherryPy, SkunkWeb, Quixote, and so on.  It's merely that the 
marketplace is already crowded with such manufacturers and launching a new 
line to compete with them isn't likely to be a profitable venture.

Instead, we've decided to standardize packaging materials and sell boxes 
and trays and suchlike to all the existing meal manufacturers.  :)


>If we provided a simpler and effective baseline, even if that standard
>only featured 90% of the power and flexibility of existing services,
>then I expect we'd see a rapid take-up of that technology.

You must be making some assumptions that aren't clear to me.  If existing 
services provide 100% of those capabilities, why hasn't one of them already 
taken the lead?  Perhaps you think that endorsement by the Web-SIG is all 
that's needed.  Maybe that could work, I don't know.

But, how will you obtain the endorsement of the Web-SIG?  Keep in mind that 
a lot of the people actually doing any work on the Web-SIG are authors of 
existing frameworks, which means to get buy-in you have to support their 
goals.  To support their goals, you need an API that allows them to 
continue to scratch whatever itches prompted them to write their particular 
framework in the first place, not to mention avoid losing their investement 
in application code already written to that API.

But, the higher the level of abstraction in the API, the greater the chance 
that some facility on which they depend, will not be expressible in the 
high-level API so as to allow them to continue to use code based on their 
existing APIs, and therefore the more difficult it will be to get the 
support of those participants.

WSGI lets us bypass all this, by beginning with something that everybody 
can use, because everybody's using HTTP, and WSGI only deals with HTTP.


>In no way do I think that the current services and frameworks are
>useless. I just worry that there's this focus on what I'd vaguely call
>'enterprise' level web development that shuts out the majority of
>developers who are trying to do something simpler.

Fair enough.  But personally, if that were my goal, I'd design *two* APIs: 
one to emulate ASP, and the other to emulate PHP.  Then I'd write 
translators to do the mechanical work of translating most of the syntax to 
e.g. PSP.  That would do *much* more to bring in non-Python developers than 
any new Python framework would.  For one thing, the mere existence of such 
a tool for ASP or PHP applications would create vast amounts of publicity 
(blog postings, articles, etc.) that money couldn't buy, and they'd be in 
exactly the right places: where those non-Python programmers will see them.

By contrast, announcing another Python framework seems to me unlikely to 
create a splash even *within* the Python community, let alone outside it.

From ianb at colorstudy.com  Fri Aug 27 18:23:02 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Aug 27 18:24:36 2004
Subject: [Web-SIG] Regarding the WSGI draft
In-Reply-To: <412F2F71.4040608@kylotan.eidosnet.co.uk>
References: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>
	<412F2F71.4040608@kylotan.eidosnet.co.uk>
Message-ID: <412F5FE6.5000504@colorstudy.com>

Ben Sizer wrote:
>> Any approach that ignores the economic reality of present-day Python 
>> web apps, and provides no way for them to migrate gradually to a new 
>> standard, is doomed to niche status at best.
> 
> I see your point but I look at this from the other side; any approach
> that is focused on the current niche status of Python web apps, is
> doomed to perpetuate that niche status. No new framework or API or
> 'standard' Python web service is going to break existing code, just
> provide an alternative. Why therefore is there such a focus on
> accommodating existing users and having them migrate over? This sounds
> too much like preaching to the converted to me.

Well, from a purely sociological tack: We need to pay attention to 
current frameworks because this is open source -- it's not just a matter 
of converting users, it's a matter of converting contributing 
developers.  Getting a bunch of users only helps an open source project 
if those users contribute back to the project.  Past performance is some 
indication of future success, so it's best to try to get *current* open 
source web developers on board.

>> And so, the only way we're going to "steal" the marketshare of
>> existing frameworks is with the consent and co-operation of the
>> developers of those frameworks.
> 
> I don't think the idea is to steal the marketshare of existing
> frameworks, as you put it. Rather, I'd think it would be about apturing
> the imagination of the average developer who would appreciate Python as
> a language. Some people won't use ASP because of the Microsoft aspect,
> or won't use PHP because of the Perl/C syntax. These are people who
> would probably be very interested in using an open language such as
> Python for this sort of thing.
> 
>> First, the current situation.  Choice of framework is a high 
>> investment for users, because once they choose, they are stuck with 
>> that framework and possibly server.
> 
> 
> This is why I would like Python to have web support in the standard
> library that is on a high enough level that you don't necessarily need a
> framework to achieve something useful.

We already have that support, and it even works pretty well with WSGI: 
the cgi module.  What, the cgi module is stupid and annoying you say? 
(Well, if you won't say it I will.)  To me that's evidence that Just Any 
Old Thing won't do.

> I readily agree that something such as WSGI would nicely form the
> backbone of the interchangeable modules. All I disagree with that you
> should then /need/ one of these competing frameworks on top of that
> before you can do anything useful. Hence my worry about the insistence
> on such frameworks. As it stands, web development is pretty much the
> only commonplace task that I can't achieve with Python using either the
> standard library or an obvious 3rd party package.

Why is WSGI's limited scope a problem?  I feel fairly certain that we 
can get WSGI approved and start building things on it fairly soon, but 
anything more expansive will take much, much longer to move forward on. 
  Your more expansive desires have been out there for a long time, if 
not proposed by yourself, proposed by other people (including me).  And 
yet there's been little forward movement in terms of any standard. 
"Little" is probably an overstatement in that last sentence.  We are 
moving ahead with a smaller step -- that's still much more forward 
progress than before.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From ianb at colorstudy.com  Fri Aug 27 18:27:53 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Aug 27 18:28:36 2004
Subject: [Web-SIG] Regarding the WSGI draft
In-Reply-To: <89DE0F3E9781C048A14DC88C06D9F93D13EF30@100nooslmsg005.common.alpharoot.net>
References: <89DE0F3E9781C048A14DC88C06D9F93D13EF30@100nooslmsg005.common.alpharoot.net>
Message-ID: <412F6109.30807@colorstudy.com>

Paul Boddie wrote:
> I think it depends how many frameworks you want to support and which 
> ones you choose. The work may be intellectually straightforward, but
> it isn't necessarily trivial. As for the value of the WSGI concept,
> if it provides a better foundation for higher-level frameworks and
> applications, then it's obviously a good thing. I'm not totally
> convinced that lots of people might want to run Webware on top of
> Twisted, for example, and that the Twisted people will get excited by
> this very notion and do the work to make it happen. (Although having
> now said that, they might rise to the challenge.) Moreover, when it
> comes to "co-locating" applications, there exists some pretty
> adequate solutions for doing so right now through Apache and other 
> generic Web server solutions.

This is open source -- the Twisted people don't have to be very excited 
about Webware in order for Webware to run on Twisted.  *Someone* has to 
be excited about it, that's all.

But WSGI takes it one further -- instead of the NxM problem which you 
are addressing with WebStack (well, Nx1 in that case, but NxM if you 
started nesting arbitrary frameworks), simply by making Webware run on 
WSGI, and making Twisted into a WSGI server, they could be used 
together.  So I think there's more reason to be optimistic about the 
possibilities.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From amk at amk.ca  Fri Aug 27 18:33:40 2004
From: amk at amk.ca (A.M. Kuchling)
Date: Fri Aug 27 18:34:03 2004
Subject: [Web-SIG] Regarding the WSGI draft
In-Reply-To: <412F5FE6.5000504@colorstudy.com>
References: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>
	<412F2F71.4040608@kylotan.eidosnet.co.uk>
	<412F5FE6.5000504@colorstudy.com>
Message-ID: <20040827163340.GA29076@rogue.amk.ca>

On Fri, Aug 27, 2004 at 11:23:02AM -0500, Ian Bicking wrote:
> Why is WSGI's limited scope a problem?  I feel fairly certain that we 
> can get WSGI approved and start building things on it fairly soon, but 
> anything more expansive will take much, much longer to move forward on. 

Definite agreement.  When faced with a problem, it doesn't take very
long to build a large list of requirements, a list large enough to
frighten away all potential implementors.  Python developers seem to
suffer from this problem to an extreme degree.

It's unfortunate that WSGI probably isn't going to be finished in time
for Python 2.4, so that BaseHTTPServer or some similar class could
support it in the stdlib.  2.4alpha3 is scheduled for September 3rd,
and is planned to be the last alpha; no new features are introduced at
the beta stage, so that means WSGI support would have to wait until
Python 2.5.

--amk

From pje at telecommunity.com  Fri Aug 27 18:43:56 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Aug 27 18:43:33 2004
Subject: [Web-SIG] Regarding the WSGI draft
In-Reply-To: <20040827163340.GA29076@rogue.amk.ca>
References: <412F5FE6.5000504@colorstudy.com>
	<5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>
	<412F2F71.4040608@kylotan.eidosnet.co.uk>
	<412F5FE6.5000504@colorstudy.com>
Message-ID: <5.1.1.6.0.20040827123801.02780ad0@mail.telecommunity.com>

At 12:33 PM 8/27/04 -0400, A.M. Kuchling wrote:
>It's unfortunate that WSGI probably isn't going to be finished in time
>for Python 2.4, so that BaseHTTPServer or some similar class could
>support it in the stdlib.  2.4alpha3 is scheduled for September 3rd,
>and is planned to be the last alpha; no new features are introduced at
>the beta stage, so that means WSGI support would have to wait until
>Python 2.5.

That's one week: BaseHTTPServer is HTTP/1.0-based if I recall correctly, so 
whipping up support shouldn't take too long.  I have a draft WSGIServer 
based on the December draft of the PEP, so it'd just have to be beefed 
up.  Also, I think a CGI-based gateway (with some kind of error handling) 
should go in, and perhaps the utility functions we discussed previously.

Documentation is an issue, though, and perhaps tests as well.  Also, I sent 
in the PEP the day before yesterday and still don't have a PEP number.  So 
getting community support for the PEP in the time remaining might be tough, 
too.

From amk at amk.ca  Fri Aug 27 19:11:31 2004
From: amk at amk.ca (A.M. Kuchling)
Date: Fri Aug 27 19:11:56 2004
Subject: [Web-SIG] Regarding the WSGI draft
In-Reply-To: <5.1.1.6.0.20040827123801.02780ad0@mail.telecommunity.com>
References: <412F5FE6.5000504@colorstudy.com>
	<5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>
	<412F2F71.4040608@kylotan.eidosnet.co.uk>
	<412F5FE6.5000504@colorstudy.com>
	<5.1.1.6.0.20040827123801.02780ad0@mail.telecommunity.com>
Message-ID: <20040827171131.GC29144@rogue.amk.ca>

On Fri, Aug 27, 2004 at 12:43:56PM -0400, Phillip J. Eby wrote:
> Documentation is an issue, though, and perhaps tests as well.  Also, I sent 
> in the PEP the day before yesterday and still don't have a PEP number.  

Editors pinged.  Coincidentally, it'll probably be PEP 333. PEP 222
was also web-related, so no more web PEPs until #444.

IMHO it should have become a PEP much earlier.  That gives a single
place to point at the current draft, rather than having to point to a
particular message in the Web-SIG list archive.  It doesn't matter if
the draft is incomplete -- we have PEPs that are just titles, so the
WSGI spec was ahead of the game from the beginning.

--amk

From brsizer at kylotan.eidosnet.co.uk  Fri Aug 27 19:26:05 2004
From: brsizer at kylotan.eidosnet.co.uk (Ben Sizer)
Date: Fri Aug 27 19:24:31 2004
Subject: [Web-SIG] Regarding the WSGI draft
In-Reply-To: <5.1.1.6.0.20040827090419.025ab160@mail.telecommunity.com>
References: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>	<5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>
	<5.1.1.6.0.20040827090419.025ab160@mail.telecommunity.com>
Message-ID: <412F6EAD.9050702@kylotan.eidosnet.co.uk>

Phillip J. Eby wrote:
> At 01:56 PM 8/27/04 +0100, Ben Sizer wrote:
>> No new framework or API or
>> 'standard' Python web service is going to break existing code, just
>> provide an alternative. Why therefore is there such a focus on
>> accommodating existing users and having them migrate over?
> 
> Because if providing an alternative would actually change anything, then 
> things should have already changed by now.  Simply providing a new 
> framework will not create any technical or social network effects, so 
> that leaves marketing as the  force to drive adoption.

But if that framework is distributed as a standard library module, 
surely it will immediately gain wider recognition - and thus adoption - 
and also momentum towards improving it.

>> This is why I would like Python to have web support in the standard
>> library that is on a high enough level that you don't necessarily need a
>> framework to achieve something useful.
> 
> As a practical matter, you'll need commuity support to get something 
> like that in the standard library, and the political reality of the 
> community is that you'll have to show why accepting your new framework 
> N+1 doesn't mean that frameworks 1 through N should also be included.

Does the suggestion in the Web-SIG charter no longer hold true then? I'm 
genuinely interested in the answer to that because the implication from 
reading it is that Python needs at least one 'good-enough' system in the 
standard library. However the implication from this list is that Web-SIG 
is more interested in catering for those who have already solved this 
problem and just want a bit more interoperability.

> Not at all; we have dozens of lines of ready meals with names like 
> Albatross, CherryPy, SkunkWeb, Quixote, and so on.  It's merely that the 
> marketplace is already crowded with such manufacturers and launching a 
> new line to compete with them isn't likely to be a profitable venture.

Sadly each of these seem to be subtly different, often with no real 
benefit to the user in those differences. For instance, all the 
different templating styles - looking at the difference between Cheetah, 
PSP, jonpy.wt, and Spyce, is there really any need for all of them? It 
seems to be a case of different syntax yet same semantics in 90% of 
cases. All of these packages seem to be of high quality and are notable 
achievements in themselves, yet I don't see that they're /really/ 
offering anything so unique that standardisation would handicap the end 
user.

>> If we provided a simpler and effective baseline, even if that standard
>> only featured 90% of the power and flexibility of existing services,
>> then I expect we'd see a rapid take-up of that technology.
> 
> You must be making some assumptions that aren't clear to me.  If 
> existing services provide 100% of those capabilities, why hasn't one of 
> them already taken the lead?

In my opinion, it's because they're underdocumented and/or overcomplex, 
and non-standard. I wouldn't know where to start with Zope. It took me a 
while to work out how to get something useful out of mod_python. 
Webware looks nice but provides an awful lot, 90% of which most people 
won't need, making it hard for beginners to get to grips with. And so 
on. I expect all of these and more besides could do /everything/ I would 
ever need from a Python web development platform. The question is 
whether it's worthwhile, given the other languages and tools available 
to me.

> But, how will you obtain the endorsement of the Web-SIG?  Keep in mind 
> that a lot of the people actually doing any work on the Web-SIG are 
> authors of existing frameworks, which means to get buy-in you have to 
> support their goals. 

I'm not interested in SIG politics, to be honest. If everybody goes away 
from this and decides I'm wrong or that my points are irrelevant to 
their needs, that's fair enough. I just didn't want to be the guy 
complaining 6 months from now and getting told, "well, you should have 
brought this up on Web-SIG earlier". I like Python as a language and 
just wished that there wasn't this paradox where such a simple and clean 
language doesn't have the simple and clean access to web objects that 
ASP and PHP do.

-- 
Ben Sizer.


From brsizer at kylotan.eidosnet.co.uk  Fri Aug 27 19:52:33 2004
From: brsizer at kylotan.eidosnet.co.uk (Ben Sizer)
Date: Fri Aug 27 19:50:51 2004
Subject: [Web-SIG] Regarding the WSGI draft
In-Reply-To: <412F5FE6.5000504@colorstudy.com>
References: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>
	<412F2F71.4040608@kylotan.eidosnet.co.uk>
	<412F5FE6.5000504@colorstudy.com>
Message-ID: <412F74E1.9060508@kylotan.eidosnet.co.uk>

Ian Bicking wrote:
> Ben Sizer wrote:
>> This is why I would like Python to have web support in the standard
>> library that is on a high enough level that you don't necessarily need a
>> framework to achieve something useful.
> 
> We already have that support, and it even works pretty well with WSGI: 
> the cgi module.  What, the cgi module is stupid and annoying you say? 
> (Well, if you won't say it I will.)  To me that's evidence that Just Any 
> Old Thing won't do.

I agree that the cgi module won't do, but that's because I disagree that 
the cgi module is "on a high enough level". I do think that support for 
sessions, query strings, form handling, templating, and various 
url-parsing and html-escaping requirements need to be in that module for 
it to be considered high-level by my (admittedly subjective) standards.

> Why is WSGI's limited scope a problem?  I feel fairly certain that we 
> can get WSGI approved and start building things on it fairly soon, but 
> anything more expansive will take much, much longer to move forward on. 
> Your more expansive desires have been out there for a long time, if not 
> proposed by yourself, proposed by other people (including me).

The only reason I think the limited scope is a problem is because it 
doesn't get me significantly closer to being able to say to my friends 
"Python is a great language for developing web sites with". It's a shame 
because I can say that about Python regarding almost any other 
application area. Maybe things will change as WSGI develops, but I can 
only comment on the draft that I see.

-- 
Ben Sizer.

From amk at amk.ca  Fri Aug 27 19:51:08 2004
From: amk at amk.ca (A.M. Kuchling)
Date: Fri Aug 27 19:51:31 2004
Subject: [Web-SIG] SIG charter
In-Reply-To: <412F6EAD.9050702@kylotan.eidosnet.co.uk>
References: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>
	<5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>
	<5.1.1.6.0.20040827090419.025ab160@mail.telecommunity.com>
	<412F6EAD.9050702@kylotan.eidosnet.co.uk>
Message-ID: <20040827175108.GA29376@rogue.amk.ca>

On Fri, Aug 27, 2004 at 06:26:05PM +0100, Ben Sizer wrote:
> Does the suggestion in the Web-SIG charter no longer hold true then? I'm 

I think the charter was written by Bill Janssen, who doesn't seem to
be actively participating on the list any more.  The charter doesn't
necessarily bear any relevance to what the individuals in the SIG are
actually doing.

For example, the charter talks about client-side HTTP, too, but no one
is working on that aspect (even though there's no real competition in
this space the way there is for server-side things).

Is it worth updating the charter?  I have no idea what a new charter
would say...

--amk
From steve at holdenweb.com  Fri Aug 27 20:27:44 2004
From: steve at holdenweb.com (Steve Holden)
Date: Fri Aug 27 20:30:13 2004
Subject: [Web-SIG] Regarding the WSGI draft
In-Reply-To: <5.1.1.6.0.20040827123801.02780ad0@mail.telecommunity.com>
References: <412F5FE6.5000504@colorstudy.com>	<5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>	<412F2F71.4040608@kylotan.eidosnet.co.uk>	<412F5FE6.5000504@colorstudy.com>
	<5.1.1.6.0.20040827123801.02780ad0@mail.telecommunity.com>
Message-ID: <412F7D20.5020201@holdenweb.com>

Phillip J. Eby wrote:

> At 12:33 PM 8/27/04 -0400, A.M. Kuchling wrote:
> 
>> It's unfortunate that WSGI probably isn't going to be finished in time
>> for Python 2.4, so that BaseHTTPServer or some similar class could
>> support it in the stdlib.  2.4alpha3 is scheduled for September 3rd,
>> and is planned to be the last alpha; no new features are introduced at
>> the beta stage, so that means WSGI support would have to wait until
>> Python 2.5.
> 
> 
> That's one week: BaseHTTPServer is HTTP/1.0-based if I recall correctly, 
> so whipping up support shouldn't take too long.  I have a draft 
> WSGIServer based on the December draft of the PEP, so it'd just have to 
> be beefed up.  Also, I think a CGI-based gateway (with some kind of 
> error handling) should go in, and perhaps the utility functions we 
> discussed previously.
> 
[...]
I am not sure that's correct. My 2.3.4 version contains the following 
comment:

"""HTTP server base class.

Note: the class in this module doesn't implement any HTTP request; see
SimpleHTTPServer for simple implementations of GET, HEAD and POST
(including CGI scripts).  It does, however, optionally implement 
HTTP/1.1 persistent connections, as of version 0.3.
"""

and there's code in there that only complains if the HTTP version is 
greater than 1.1.

Would be neat if you could do it, though it's a demanding and 
error-prone task to generate code on such short notice.

Good luck!

regards
  Steve
-- 
XXX Please note recent change of email address

From neel at mediapulse.com  Fri Aug 27 21:31:11 2004
From: neel at mediapulse.com (Michael C. Neel)
Date: Fri Aug 27 21:27:55 2004
Subject: [Web-SIG] SIG charter
In-Reply-To: <20040827175108.GA29376@rogue.amk.ca>
References: <5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>
	<5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>
	<5.1.1.6.0.20040827090419.025ab160@mail.telecommunity.com>
	<412F6EAD.9050702@kylotan.eidosnet.co.uk>
	<20040827175108.GA29376@rogue.amk.ca>
Message-ID: <1093635070.1239.198.camel@mike.mediapulse.com>

On Fri, 2004-08-27 at 13:51, A.M. Kuchling wrote:
> On Fri, Aug 27, 2004 at 06:26:05PM +0100, Ben Sizer wrote:
> > Does the suggestion in the Web-SIG charter no longer hold true then? I'm 
> 
> I think the charter was written by Bill Janssen, who doesn't seem to
> be actively participating on the list any more.  The charter doesn't
> necessarily bear any relevance to what the individuals in the SIG are
> actually doing.
> 

When first started, there were alot of ideas thrown about, but I think
there was also alot of 'sniping' going on, which led to a very quiet
period on the list.  I don't think most are here to fight for ideals,
just to write code and solve some common problems.

The current charter is too narrow in focus, and carries a slight bias
tword suggested solutions.  If I were asked what I think the charter
should say, it would be along the lines of reviewing, updating, and
adding modules to the python standard library that related to the web
and web based technologies, and at the same time defining recommended
standards around these technologies.

The list is doing the latter, but not the former, which is a shame. 
There are several "common domain" problems that could be addressed, but
aren't.  One example is cookies, there is a python stdlib module for
cookies and yet mod python has it's own cookie module -- this points to
a reason to review the stdlib module because it isn't providing what is
needed to mod python.  Considering any framework will need to address
cookies, this is something that makes sense to address on a Web SIG.

There have also been ideas for servers and clients in the stdlib (which
there already are some, so they would be built upon and expanded) and
even a mention of python applets.  I think all of these should also fall
into the Web SIG charter, and at least merit discussion.

I think the WSGI is a good concept and idea, but not a burning issue. 
In practice, I have never had the need to port an application /
framework across servers and platforms.  If that was part of the scope,
it was along with a complete rewrite so I really didn't want to keep the
prior application's code and possible it's framework.  Also, many
frameworks have placed the server specific code into a controller class
that can be quickly subclassed and taken to a new server.

Mike


From pje at telecommunity.com  Fri Aug 27 21:32:18 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Aug 27 21:31:58 2004
Subject: [Web-SIG] Regarding the WSGI draft
In-Reply-To: <412F7D20.5020201@holdenweb.com>
References: <5.1.1.6.0.20040827123801.02780ad0@mail.telecommunity.com>
	<412F5FE6.5000504@colorstudy.com>
	<5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>
	<412F2F71.4040608@kylotan.eidosnet.co.uk>
	<412F5FE6.5000504@colorstudy.com>
	<5.1.1.6.0.20040827123801.02780ad0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040827152156.03835660@mail.telecommunity.com>

At 02:27 PM 8/27/04 -0400, Steve Holden wrote:
>I am not sure that's correct. My 2.3.4 version contains the following comment:
>
>"""HTTP server base class.
>
>Note: the class in this module doesn't implement any HTTP request; see
>SimpleHTTPServer for simple implementations of GET, HEAD and POST
>(including CGI scripts).  It does, however, optionally implement HTTP/1.1 
>persistent connections, as of version 0.3.
>"""
>
>and there's code in there that only complains if the HTTP version is 
>greater than 1.1.

It's not anywhere *near* RFC-compliant, though, based on our discussions of 
RFC 2616 here as regards e.g. "100 expect/continue".


>Would be neat if you could do it, though it's a demanding and error-prone 
>task to generate code on such short notice.

It wouldn't be that short; there's already a WSGIServer.py in my CVS based 
on the December draft; the differences between that and today's WSGI are 
minor when it comes to the semantics.  It doesn't really offer decent error 
handling, but then again neither does BaseHTTPServer or CGIServer.

From pje at telecommunity.com  Fri Aug 27 21:49:30 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Aug 27 21:49:04 2004
Subject: [Web-SIG] FYI: PEP 333 posted to python-dev and python-list
Message-ID: <5.1.1.6.0.20040827154658.01efaec0@mail.telecommunity.com>

Anything we talked about in the last two days or so isn't in it yet, as 
this is the version I submitted to the PEP editors.

From pf_moore at yahoo.co.uk  Fri Aug 27 22:40:50 2004
From: pf_moore at yahoo.co.uk (Paul Moore)
Date: Fri Aug 27 22:40:40 2004
Subject: [Web-SIG] Re: Regarding the WSGI draft
References: <412E2C3E.7000900@kylotan.eidosnet.co.uk>
Message-ID: <u8yc048ml.fsf@yahoo.co.uk>

Ben Sizer <brsizer@kylotan.eidosnet.co.uk> writes:

> I've read through the draft and most of the messages on this list that 
> followed it. However, I have a basic problem with it which I will 
> attempt to summarise below.

[...]

> What I'd like to see is something mirroring the Python Database API. For 
> instance, I might have to change "import MySQLdb" to "import pyPgSQL" 
> but I know that 99% of the rest of the database code will work fine. As 
> a web developer I would like to be able to change "import cgi" to 
> "import mod_python" or "import fastcgi" and know that, if I follow a 
> standard set of calls, I will have a simple and standard way of 
> producing a web document.

I have some reservations, as well. My perspective is as a web
application *consumer* rather than a developer. I have a server which
runs MoinMoin and Roundup (among other web apps). MoinMoin runs under
mod_python, whereas Roundup runs as its own server, accessed via
Apache and mod_proxy. If I wanted to add PyBlosxom, I'd need to run
it as CGI (which, given the server hardware, is horribly slow). The
variety of servers and backends gets hard to manage (and that's just
with 3 applications!)

I'd much prefer to only use one underlying architecture (probably
mod_python), but Roundup and PyBlosxom don't support it. Ben's idea
of application writers being able to easily support multiple servers,
much like the DB API supports multiple backends, would be a real
bonus for me, as it would make it far more likely that I could do
something like this. (Either because application writers would
include additional support, or because it would be simple enough for
me to add it myself).

I get the impression that the WSGI idea of layering and middleware
might make this more likely in the longer term, but I don't see how it
might happen. It certainly doesn't make it seem like something I could
do for myself with an existing application. Maybe I'm missing
something crucial here, but I'd certainly like to see this clarified,
if it's the case.

Paul
-- 
"Bother," said the Borg, "We've assimilated Pooh."

From pf_moore at yahoo.co.uk  Fri Aug 27 22:50:26 2004
From: pf_moore at yahoo.co.uk (Paul Moore)
Date: Fri Aug 27 22:50:16 2004
Subject: [Web-SIG] Re: Regarding the WSGI draft
References: <412E2C3E.7000900@kylotan.eidosnet.co.uk>
	<5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>
Message-ID: <u4qmo486l.fsf@yahoo.co.uk>

"Phillip J. Eby" <pje@telecommunity.com> writes:

> First, users can experiment with other frameworks, especially if those 
> frameworks are lightweight.  This builds competitive pressure in the 
> direction of lightweight, easy-to-integrate frameworks.  So framework 
> developers begin to break their monolithic approaches down into smaller 
> pieces that operate on segments of WSGI.  For example, a session service 
> that you pass the incoming 'environ' and outgoing 'headers' to, in order 
> for it to read and set cookies.  (Notice that this *isn't* a WSGI-defined 
> or standardized service, just a service implemented *in terms of* WSGI.)

I think this starts to address the question I raised in my previous
posting, about "run anywhere" applications. If an application is
written to use WSGI-compliant services, it could run on any
WSGI-compliant server.

But doesn't this raise a complementary issue? With 10 applications
running, I have one server. But I also have 5 session handling
services, 8 authentication services, 3 error handling services, etc,
etc. Maybe that's where the pressure for "best of breed" services
comes from.

Small steps, I guess...

Paul.
-- 
The major difference between a thing that might go wrong and a thing
that cannot possibly go wrong is that when a thing that cannot
possibly go wrong goes wrong it usually turns out to be impossible to
get at or repair. -- Douglas Adams

From pje at telecommunity.com  Fri Aug 27 23:00:45 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Aug 27 23:00:24 2004
Subject: [Web-SIG] Re: Regarding the WSGI draft
In-Reply-To: <u8yc048ml.fsf@yahoo.co.uk>
References: <412E2C3E.7000900@kylotan.eidosnet.co.uk>
Message-ID: <5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com>

At 09:40 PM 8/27/04 +0100, Paul Moore wrote:
>I have a server which
>runs MoinMoin and Roundup (among other web apps). MoinMoin runs under
>mod_python, whereas Roundup runs as its own server, accessed via
>Apache and mod_proxy. If I wanted to add PyBlosxom, I'd need to run
>it as CGI (which, given the server hardware, is horribly slow). The
>variety of servers and backends gets hard to manage (and that's just
>with 3 applications!)
>
>I'd much prefer to only use one underlying architecture (probably
>mod_python), but Roundup and PyBlosxom don't support it. Ben's idea
>of application writers being able to easily support multiple servers,
>much like the DB API supports multiple backends, would be a real
>bonus for me, as it would make it far more likely that I could do
>something like this. (Either because application writers would
>include additional support, or because it would be simple enough for
>me to add it myself).
>
>I get the impression that the WSGI idea of layering and middleware
>might make this more likely in the longer term, but I don't see how it
>might happen. It certainly doesn't make it seem like something I could
>do for myself with an existing application. Maybe I'm missing
>something crucial here, but I'd certainly like to see this clarified,
>if it's the case.

Well, if you can identify the top-level control point of PyBlosxom and 
Roundup, you can always try converting them to WSGI.  But, maybe if there's 
a stdlib module for WSGI utilities, a useful one would probably be 
something to run some code in such a way that it thinks it's running under 
CGI, even though it's really running under WSGI.  The degree to which this 
could be assured is of course dependent on precisely what the application 
*does*, but getting 80% of CGIs (that don't depend on some kind of global 
state that isn't reset after each execution) to be able to run in arbitrary 
WSGI servers would be a handy thing, and most appropriate for the stdlib.

Anybody want to volunteer to write it?  ;)  If it helps, WSGIServer has 
some code for parsing stdout headers; see:

http://cvs.eby-sarna.com/PEAK/src/peak/util/WSGIServer.py?rev=1.3&content-type=text/vnd.viewcvs-markup

in the WSGIRequestHandler class.  (Note: this is the code I mentioned 
that's based on the December WSGI draft, where the response status and 
headers were embedded in the output stream rather than being function 
arguments.  So don't use it as an example of a proper WSGI server at the 
moment!)

(Offtopic, I'd note that a major reason PyBlosxom is slow may have nothing 
to do with CGI: my offhand impression of its code is that it appears to 
scan through file directories for every page rendering, just to do things 
like find what "flavours" might be defined in some of your post 
directories.  But I could be wrong, and there may be some "caching" plugins 
that would help this.)

From pje at telecommunity.com  Fri Aug 27 23:12:51 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Aug 27 23:12:29 2004
Subject: [Web-SIG] Re: Regarding the WSGI draft
In-Reply-To: <u4qmo486l.fsf@yahoo.co.uk>
References: <412E2C3E.7000900@kylotan.eidosnet.co.uk>
	<5.1.1.6.0.20040826150415.02cb0d80@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040827170109.022bacb0@mail.telecommunity.com>

At 09:50 PM 8/27/04 +0100, Paul Moore wrote:
>"Phillip J. Eby" <pje@telecommunity.com> writes:
>
> > First, users can experiment with other frameworks, especially if those
> > frameworks are lightweight.  This builds competitive pressure in the
> > direction of lightweight, easy-to-integrate frameworks.  So framework
> > developers begin to break their monolithic approaches down into smaller
> > pieces that operate on segments of WSGI.  For example, a session service
> > that you pass the incoming 'environ' and outgoing 'headers' to, in order
> > for it to read and set cookies.  (Notice that this *isn't* a WSGI-defined
> > or standardized service, just a service implemented *in terms of* WSGI.)
>
>I think this starts to address the question I raised in my previous
>posting, about "run anywhere" applications. If an application is
>written to use WSGI-compliant services, it could run on any
>WSGI-compliant server.
>
>But doesn't this raise a complementary issue? With 10 applications
>running, I have one server. But I also have 5 session handling
>services, 8 authentication services, 3 error handling services, etc,
>etc. Maybe that's where the pressure for "best of breed" services
>comes from.
>
>Small steps, I guess...

Right.  Journey of a thousand miles, single step, that sort of 
thing.  :)  Anyway, once you have 5, 8, 3, etc. things that are focused on 
specific areas, you have an opportunity for *focused* discussion on that 
area, and a chance of making some progress on a standard.  Right now, WSGI 
is focused intently on HTTP, because that's the *only* thing everybody's 
definitely got in common.

When WSGI is also "common", then it's easy to look at other layers, because 
the server differences are factored out.  So, we can pull out another 
layer, making the *next* layer up come into sharper focus, and so on.

And, as you say, the duplication *will* provide a new kind of market 
pressure, to reduce duplication and consolidate the choices.  The overall 
process is somewhat organic, I think, but it has to be started in a way 
that will take advantage of the forces currently in play (e.g. developer 
interest, existing investments, etc.) rather than working against them.

From ianb at colorstudy.com  Fri Aug 27 23:15:06 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri Aug 27 23:15:40 2004
Subject: [Web-SIG] Re: Regarding the WSGI draft
In-Reply-To: <5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com>
References: <412E2C3E.7000900@kylotan.eidosnet.co.uk>
	<5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com>
Message-ID: <412FA45A.8010809@colorstudy.com>

Phillip J. Eby wrote:
> Well, if you can identify the top-level control point of PyBlosxom and 
> Roundup, you can always try converting them to WSGI.  But, maybe if 
> there's a stdlib module for WSGI utilities, a useful one would probably 
> be something to run some code in such a way that it thinks it's running 
> under CGI, even though it's really running under WSGI.  The degree to 
> which this could be assured is of course dependent on precisely what the 
> application *does*, but getting 80% of CGIs (that don't depend on some 
> kind of global state that isn't reset after each execution) to be able 
> to run in arbitrary WSGI servers would be a handy thing, and most 
> appropriate for the stdlib.

I happened to be playing with just such a thing:

http://colorstudy.com/cgi-bin/viewcvs.cgi/trunk/WSGI/pycgiwrapper.py?rev=206&view=log

There's a few parts I kind of punted on, though now that I think about 
it I know what I did wrong, so I'll fix it a bit this evening.  Anyway, 
it's intended to work both for multiprocess (e.g., mod_python) and 
threaded servers, with decreasing likelihood that any particular script 
will actually work.

But I haven't yet tested it under anything but CGI, so it really 
*should* work ;)  I'll try running it with your WSGIServer and see how 
it goes.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From pje at telecommunity.com  Fri Aug 27 23:17:23 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Aug 27 23:17:03 2004
Subject: [Web-SIG] FYI: PEP 333 posted to python-dev and python-list
In-Reply-To: <5.1.1.6.0.20040827154658.01efaec0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040827171300.022a9d10@mail.telecommunity.com>

At 03:49 PM 8/27/04 -0400, Phillip J. Eby wrote:
>Anything we talked about in the last two days or so isn't in it yet, as 
>this is the version I submitted to the PEP editors.

Argh.  It bounced back to me for length reasons.  :(  I'm going to have to 
refer people to the online text.

From pje at telecommunity.com  Fri Aug 27 23:36:21 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Aug 27 23:35:59 2004
Subject: [Web-SIG] Re: Regarding the WSGI draft
In-Reply-To: <412FA45A.8010809@colorstudy.com>
References: <5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com>
	<412E2C3E.7000900@kylotan.eidosnet.co.uk>
	<5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040827172056.025e6e50@mail.telecommunity.com>

At 04:15 PM 8/27/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>Well, if you can identify the top-level control point of PyBlosxom and 
>>Roundup, you can always try converting them to WSGI.  But, maybe if 
>>there's a stdlib module for WSGI utilities, a useful one would probably 
>>be something to run some code in such a way that it thinks it's running 
>>under CGI, even though it's really running under WSGI.  The degree to 
>>which this could be assured is of course dependent on precisely what the 
>>application *does*, but getting 80% of CGIs (that don't depend on some 
>>kind of global state that isn't reset after each execution) to be able to 
>>run in arbitrary WSGI servers would be a handy thing, and most 
>>appropriate for the stdlib.
>
>I happened to be playing with just such a thing:
>
>http://colorstudy.com/cgi-bin/viewcvs.cgi/trunk/WSGI/pycgiwrapper.py?rev=206&view=log

Wow; you certainly thought this through more than I did.  E.g. your model 
for dealing with threads.  The part for dealing with missing 
threads/threading though should probably use dummy_threading, though.

Also, I notice that you're using multiple 'environ' replacements for some 
reason, even though any use of it is going to be wrapped in a global thread 
lock.  So, the '_environs' dictionary seems superfluous.  A minimal 
approach to adjusting the FieldStorage class could be:

    cgi.FieldStorage.__init__.im_func.func_defaults=(None,None,"",environ,0,0)

which will make the default environ be the one you want.  Doing that, and 
replacing 'os.environ' for the request's duration (and putting them back 
when done) seem to be required.  (Note that code that doesn't use the 'cgi' 
module but directly checks os.environ won't work with the current state of 
things.)

Hm.  Actually, I just looked and it looks like you're not wrapping the 
execution in the global mutex, but you really need to because not only are 
sys.std* global, CGI apps aren't generally written to be multithreaded.

Nice design otherwise, though.  I particularly like that if you don't want 
to run a script file, you can just override 'run_script' to do whatever the 
request body is.


>There's a few parts I kind of punted on, though now that I think about it 
>I know what I did wrong, so I'll fix it a bit this evening.  Anyway, it's 
>intended to work both for multiprocess (e.g., mod_python) and threaded 
>servers, with decreasing likelihood that any particular script will 
>actually work.
>
>But I haven't yet tested it under anything but CGI, so it really *should* 
>work ;)  I'll try running it with your WSGIServer and see how it goes.

Don't forget: WSGIServer has *not* been updated to PEP 333 yet; it's still 
based on the old streaming approach!  I've been too busy updating the spec 
(and replying to every thread) to update the code.  :)

From pje at telecommunity.com  Fri Aug 27 23:59:51 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri Aug 27 23:59:29 2004
Subject: [Web-SIG] Stuff left to be done on WSGI
Message-ID: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>

I don't know if it's possible for us to get these items together in time 
for 2.4; if we don't, we don't.  There's little harm in having a separate 
'wsgi' distribution until 2.5 rolls around.  I'm thinking the package 
should include:

  * BaseHTTPServer-based WSGI server
  * CGI-based WSGI gateway (run WSGI apps under CGI)
  * WSGI app that wraps CGI applications so they can run under WSGI
  * Utility routines to fulfill certain parts of the spec's requirements
  * HTTP/1.1 practice guidelines, and utility routines where appropriate
  * Documentation

This looks like quite a list to do in just a few days, despite the fact 
that we have skeleton implementations of the first four items and part of 
the fifth.  And that's completely ignoring these currently outstanding 
issues in the PEP itself:

   * List-of-tuples vs. email.Message for outgoing headers
   * Exception handling

Plus, I'm a couple days behind in updates to reflect the SIG's current 
consensus on other outstanding issues, and haven't done anything to 
separate the HTTP/1.1 guidelines out.

Anyway, we really need to finish the outstanding open issues, because until 
the spec is firm on those items, we're coding on sand in those areas.

I personally would like to use email.Message, and I'm even tempted to make 
'Status' a header, so that it's just 'start_response(headers)' instead of 
'start_response(status,headers)'.  The Content-Transfer-Encoding 
boilerplate is only needed by servers and gateways, and I don't think 
adding another two lines of code creates a big burden there.  But it makes 
middleware's job a lot easier: just add or modify headers, rather than 
having to turn the sequence of headers into some other structure and back 
again, or having to write utility routines to duplicate the functionality 
already in email.Message.

With regard to exception handling, Ian has pointed out that it's hard for 
middleware to trap exceptions well, because it can't tell whether the next 
app down the chain has written headers yet, unless it replaces 
'start_response', which then means it disables any advanced server APIs.

After thinking about this for a while, I'm having trouble seeing a problem 
with that.  Specifically, exception-catching middleware *is* modifying the 
output mechanism, because it will change the output in that case.  It 
doesn't seem to me that you can safely write exception-catching middleware 
that can work without disabling the use of extension APIs for application 
output.

The only other thing that comes to mind is requiring servers to support 
multiple 'start_response' calls in some way that makes sense for exception 
handlers, while requiring it to still work in the case where an extension 
API has already been used for output.

From ianb at colorstudy.com  Sat Aug 28 00:09:11 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Sat Aug 28 00:09:22 2004
Subject: [Web-SIG] Re: Regarding the WSGI draft
In-Reply-To: <5.1.1.6.0.20040827172056.025e6e50@mail.telecommunity.com>
References: <5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com>
	<412E2C3E.7000900@kylotan.eidosnet.co.uk>
	<5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com>
	<5.1.1.6.0.20040827172056.025e6e50@mail.telecommunity.com>
Message-ID: <412FB107.7020702@colorstudy.com>

Phillip J. Eby wrote:
>> I happened to be playing with just such a thing:
>>
>> http://colorstudy.com/cgi-bin/viewcvs.cgi/trunk/WSGI/pycgiwrapper.py?rev=206&view=log 
>
> 
> Wow; you certainly thought this through more than I did.  E.g. your 
> model for dealing with threads.  The part for dealing with missing 
> threads/threading though should probably use dummy_threading, though.

I figure if threads are missing, then we better not be in a 
wsgi.threaded environment, and if it's not threaded server then I don't 
use the threads or threading modules.

> Also, I notice that you're using multiple 'environ' replacements for 
> some reason, even though any use of it is going to be wrapped in a 
> global thread lock.  So, the '_environs' dictionary seems superfluous.  
> A minimal approach to adjusting the FieldStorage class could be:
> 
>    
> cgi.FieldStorage.__init__.im_func.func_defaults=(None,None,"",environ,0,0)
> 
> which will make the default environ be the one you want.  Doing that, 
> and replacing 'os.environ' for the request's duration (and putting them 
> back when done) seem to be required.  (Note that code that doesn't use 
> the 'cgi' module but directly checks os.environ won't work with the 
> current state of things.)

Maybe other people don't use os.environ, but I always have a lot when 
doing cgi scripts, so I want to handle that case.  It had actually never 
occurred to me before to access environ through the cgi module...

> Hm.  Actually, I just looked and it looks like you're not wrapping the 
> execution in the global mutex, but you really need to because not only 
> are sys.std* global, CGI apps aren't generally written to be multithreaded.

Well, there's basically two code paths, the multithreaded and the 
multiprocess.  I thought about the multithreaded more, but have only 
tested the multi-process.

In both cases I try to replace sys.stdout and os.environ.  I forgot to 
put things back the way they were with the multiprocess technique, so 
it's a little broken now.  With the multithreaded case, it uses the 
thread ID to figure out what stream or environment you will be looking 
at, so it doesn't need a lock around run_script -- each thread sees a 
stdout and environ that is appropriate for it.  I guess I could just 
change stdin too and not worry about fidding with the cgi module at all.

Though it can cause problems.  E.g., if instead of the cgi server 
passing sys.stdout.write, it passed:

def write(s):
     sys.stdout.write(s)

That would cause all sorts of problems.  Unless it used 
sys.__stdout__.write(s); I don't know if that would be a good or bad 
style.  That's what I did to work around my bug.

Anyway, the whole thing is a bit of a hack, so I don't expect it to work 
seemlessly with all scripts or all servers, though hopefully without 
heroic modifications it would be possible.  MoinMoin would be an 
excellent test, as I believe it is hopelessly bound to the cgi module, 
but would benefit nicely from running on a different environment, at 
least long-running multi-process.

> Don't forget: WSGIServer has *not* been updated to PEP 333 yet; it's 
> still based on the old streaming approach!  I've been too busy updating 
> the spec (and replying to every thread) to update the code.  :)

Hmm... I don't even remember what the old spec looked like anymore. 
I'll give it a look-see.

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From ianb at colorstudy.com  Sat Aug 28 02:00:18 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Sat Aug 28 02:00:25 2004
Subject: [Web-SIG] Stuff left to be done on WSGI
In-Reply-To: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>
References: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>
Message-ID: <412FCB12.2030209@colorstudy.com>

Phillip J. Eby wrote:
> I don't know if it's possible for us to get these items together in time 
> for 2.4; if we don't, we don't.  

I can't imagine we would make it.  Hopefully we produce something for 
2.5, that can be installed on previous Python installations under the 
same name.  (Not like optparse/optik)

I would hope that we can come to some consensus and produce something 
useable before 2.5, with the understanding that it will be included in 
2.5.  I would kind of like to see a "web" package.

> There's little harm in having a 
> separate 'wsgi' distribution until 2.5 rolls around.  I'm thinking the 
> package should include:
> 
>  * BaseHTTPServer-based WSGI server
>  * CGI-based WSGI gateway (run WSGI apps under CGI)

You've noted these are missing error handling.  What kind were you 
thinking of specifically?

There's exception handling, which seems straight forward.  Spec 
compliance?  Certainly an anal version of these servers should be 
written, that checks every type passed around, looks for common 
mistakes, etc.  I don't know if the anal and the useable version need to 
be the same thing.

Did you have any other error cases you were thinking of?

>  * WSGI app that wraps CGI applications so they can run under WSGI

Two models -- one that optimistically tries to load the cgi module in a 
fake environment (what I did), plus another that actually runs any CGI 
script.  And maybe another one that forks and ultimately dies, can run 
any Python CGI script, but saves some startup time.  But that last one 
isn't that important.

>  * Utility routines to fulfill certain parts of the spec's requirements
>  * HTTP/1.1 practice guidelines, and utility routines where appropriate
>  * Documentation
> 
> This looks like quite a list to do in just a few days, despite the fact 
> that we have skeleton implementations of the first four items and part 
> of the fifth.  And that's completely ignoring these currently 
> outstanding issues in the PEP itself:
> 
>   * List-of-tuples vs. email.Message for outgoing headers
>   * Exception handling
> 
> Plus, I'm a couple days behind in updates to reflect the SIG's current 
> consensus on other outstanding issues, and haven't done anything to 
> separate the HTTP/1.1 guidelines out.
> 
> Anyway, we really need to finish the outstanding open issues, because 
> until the spec is firm on those items, we're coding on sand in those areas.
> 
> I personally would like to use email.Message, and I'm even tempted to 
> make 'Status' a header, so that it's just 'start_response(headers)' 
> instead of 'start_response(status,headers)'.  The 
> Content-Transfer-Encoding boilerplate is only needed by servers and 
> gateways, and I don't think adding another two lines of code creates a 
> big burden there.  But it makes middleware's job a lot easier: just add 
> or modify headers, rather than having to turn the sequence of headers 
> into some other structure and back again, or having to write utility 
> routines to duplicate the functionality already in email.Message.

If we use email.Message, using a status header seems fine.  If not, I 
think it should be separate -- I don't want to search a list for the 
status header.

I don't think the utility functions are a big deal at all, and I worry 
that there's some gotchas to email.Message, specifically where it is 
intended for email.  So I'm certainly not adamantly opposed to 
email.Message, but I'm not adamantly for it either.  I'd rather see a 
superclass of email.Message (such a superclass does not yet exist, but 
should be easy to write/extract) that is more minimal.

> With regard to exception handling, Ian has pointed out that it's hard 
> for middleware to trap exceptions well, because it can't tell whether 
> the next app down the chain has written headers yet, unless it replaces 
> 'start_response', which then means it disables any advanced server APIs.
> 
> After thinking about this for a while, I'm having trouble seeing a 
> problem with that.  Specifically, exception-catching middleware *is* 
> modifying the output mechanism, because it will change the output in 
> that case.  It doesn't seem to me that you can safely write 
> exception-catching middleware that can work without disabling the use of 
> extension APIs for application output.

To me it doesn't feel like the middleware is modifying the output.  It 
is augmenting the output in a case where there has been an unexpected 
failure.  I guess that could cause a problem, but then I think any 
middleware that is sensitive to the response being modified must still 
always allow for extra response coming in through normal channels.

But, I don't know.  I'm still up in the air.  Really, I just don't like 
wrapping start_response, from a mechanical point of view.  It feels 
awkward to me.  I wish I could just query the server as to what point in 
the response it is at.

> The only other thing that comes to mind is requiring servers to support 
> multiple 'start_response' calls in some way that makes sense for 
> exception handlers, while requiring it to still work in the case where 
> an extension API has already been used for output.

That seems too hard.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From mnot at mnot.net  Sat Aug 28 02:11:30 2004
From: mnot at mnot.net (Mark Nottingham)
Date: Sat Aug 28 02:11:33 2004
Subject: [Web-SIG] Stuff left to be done on WSGI
In-Reply-To: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>
References: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>
Message-ID: <D030AB65-F886-11D8-82BE-000A95BD86C0@mnot.net>

I'd be inclined to keep a separation between status and headers, so 
that one doesn't have to worry about collisions, namespace pollution, 
etc.

For that matter, my preference would be for environ to be split into 
(environ, request_method, request_url, request_headers) or similar. 
However, I know it's late, and I don't want to hold things up.

email.Message seems like a reasonable thing to do.


On Aug 27, 2004, at 2:59 PM, Phillip J. Eby wrote:

> I personally would like to use email.Message, and I'm even tempted to 
> make 'Status' a header, so that it's just 'start_response(headers)' 
> instead of 'start_response(status,headers)'.

--
Mark Nottingham     http://www.mnot.net/

From ianb at colorstudy.com  Sat Aug 28 02:22:11 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Sat Aug 28 02:22:15 2004
Subject: [Web-SIG] Stuff left to be done on WSGI
In-Reply-To: <D030AB65-F886-11D8-82BE-000A95BD86C0@mnot.net>
References: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>
	<D030AB65-F886-11D8-82BE-000A95BD86C0@mnot.net>
Message-ID: <412FD033.6060607@colorstudy.com>

Mark Nottingham wrote:
> For that matter, my preference would be for environ to be split into 
> (environ, request_method, request_url, request_headers) or similar. 
> However, I know it's late, and I don't want to hold things up.

I think this make pass-through a bit harder, which I imagine could be 
fairly common.  And it also would add redundancy, since environ as 
define by CGI contains all those other objects.  If not CGI variables, 
then we wouldn't be building on any particular spec.

Also, request_url isn't actually part of environ right now.  Instead 
there is SCRIPT_NAME and PATH_INFO, which provides important information 
about how to parse the URL.  There's also the (optional) REQUEST_URI, 
which I think is useful, but only advisory.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From mnot at mnot.net  Sat Aug 28 02:54:00 2004
From: mnot at mnot.net (Mark Nottingham)
Date: Sat Aug 28 02:54:07 2004
Subject: [Web-SIG] Stuff left to be done on WSGI
In-Reply-To: <412FD033.6060607@colorstudy.com>
References: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>
	<D030AB65-F886-11D8-82BE-000A95BD86C0@mnot.net>
	<412FD033.6060607@colorstudy.com>
Message-ID: <BFE4BFC5-F88C-11D8-82BE-000A95BD86C0@mnot.net>

Could you expand on the problems that would be encountered with 
pass-through?

I don't think it would add redundancy in the CGI case, it would just 
require CGI WSGI servers to remove http headers from the environment 
and put them in the proper data structure.

WRT URIs, my preference (once again, just stating what I'd do, not 
saying that I think this MUST change) would be to base it on the 
underlying specs, and not make it so CGI-centric; i.e., have 'abs_path' 
and 'query' (these are the BNF productions in both 2396 and 2616), and 
that's it; anything else (e.g., script location) would be in the 
environment, and probably server-specific.

Cheers,


On Aug 27, 2004, at 5:22 PM, Ian Bicking wrote:

> Mark Nottingham wrote:
>> For that matter, my preference would be for environ to be split into 
>> (environ, request_method, request_url, request_headers) or similar. 
>> However, I know it's late, and I don't want to hold things up.
>
> I think this make pass-through a bit harder, which I imagine could be 
> fairly common.  And it also would add redundancy, since environ as 
> define by CGI contains all those other objects.  If not CGI variables, 
> then we wouldn't be building on any particular spec.
>
> Also, request_url isn't actually part of environ right now.  Instead 
> there is SCRIPT_NAME and PATH_INFO, which provides important 
> information about how to parse the URL.  There's also the (optional) 
> REQUEST_URI, which I think is useful, but only advisory.
>
> -- 
> Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
>

--
Mark Nottingham     http://www.mnot.net/

From floydophone at gmail.com  Sat Aug 28 03:56:53 2004
From: floydophone at gmail.com (Peter Hunt)
Date: Sat Aug 28 03:56:58 2004
Subject: [Web-SIG] My port of jonpy to current WSGI draft
Message-ID: <6654eac40408271856228642c2@mail.gmail.com>

I've taken the liberty to add a jonpy adapter to WSGI. In short, it
works. It's an example of a high-level, servlet interface to WSGI, and
it allows you to write real WSGI apps _now_, and also supports apps
that were written to run on other platforms.

My hello, world executed alright, though I havent sufficiently tested
it yet. From the looks of it, however, I think it's a complete
implementation. The only design issue I see is that it doesn't use
yielding; it is push.

The attached files are:
- wsgicgi.py - the run_with_cgi method described in the pre-PEP,
except I fixed a typo and fixed some issues regarding the blank line
between the headers and content, and commented out the non-standard
Status: header.
- test.cgi - the hello, world test script, taken verbatim from the jonpy website
- jonpy_wsgi.py - the jonpy middleware I wrote


I know there's no unit tests or comments or docs, but it allows many
real world apps to run on WSGI _today_.

Tested on Windows XP, and IIS (so sue me ;) ).
-------------- next part --------------
import os, sys

def run_with_cgi(application):

    environ = {}
    environ.update(os.environ)
    environ['wsgi.input']        = sys.stdin
    environ['wsgi.errors']       = sys.stderr
    environ['wsgi.version']      = '1.0'
    environ['wsgi.multithread']  = False
    environ['wsgi.multiprocess'] = True

    def start_response(status,headers):
        #print "Status:", status
        for key,val in headers:
            print "%s: %s" % (key,val)
        print
        return sys.stdout.write

    result = application(environ, start_response)
    if result:
        try:
            for data in result:
                sys.stdout.write(data)
        finally:
            if hasattr(result,'close'):
                result.close()
               
-------------- next part --------------
from jon import cgi

class WSGIRequest(cgi.Request):
	"""An implementation of Request which is also a WSGI app."""
	def __init__(self, handler):
		cgi.Request.__init__(self, handler)
	def __call__(self, environ, start_response):
		self.environ = environ
		self.stdin = environ['wsgi.input']
		self.start_response = start_response
		self._writefunc = None
		cgi.Request._init(self)
		self.process()
	def process(self):
		"""Execute the handler"""
		self._init()
		try:
			handler = self._handler_type()
		except:
			self.traceback()
		else:
			try:
				handler.process(self)
			except:
				handler.traceback(self)
		self.close()
	def output_headers(self):
		self._writefunc = self.start_response("200 OK", self._headers)
	def error(self, s):
		self.environ['wsgi.error'].write(s)
	def _write(self, s):
		assert self._writefunc != None
		self._writefunc(s)

#def simple_app(environ, start_response):
#     """Simplest possible application object"""
#     status = '200 OK'
#     headers = [('Content-type','text/plain')]
#     write = start_response(status, headers)
#     write('Hello world!\n')

-------------- next part --------------
#!/usr/bin/env python

from jon import cgi
import wsgicgi
import jonpy_wsgi

# bit redundant: cgi->wsgi->cgi

class Handler(cgi.Handler):
	def process(self, req):
		req.set_header("Content-Type", "text/plain")
		req.write("Hello, %s!\n" % req.params.get("greet", "world"))

wsgicgi.run_with_cgi(jonpy_wsgi.WSGIRequest(Handler))
From floydophone at gmail.com  Sat Aug 28 03:58:56 2004
From: floydophone at gmail.com (Peter Hunt)
Date: Sat Aug 28 03:59:02 2004
Subject: [Web-SIG] Whoops! Quick little patch to my previous post
Message-ID: <6654eac404082718585af408ab@mail.gmail.com>

jonpy_wsgi.py:

I added a redundant call to cgi.Request._init(self) in __call__; feel
free to remove it.
From floydophone at gmail.com  Sat Aug 28 04:42:34 2004
From: floydophone at gmail.com (Peter Hunt)
Date: Sat Aug 28 04:42:37 2004
Subject: [Web-SIG] My WSGIHTTPServer implementation
Message-ID: <6654eac404082719426d43aaf4@mail.gmail.com>

As the PEAK one is horribly out of date, I decided to implement a new
one. I don't know if this is exactly the interface you want, but it's
a start.

The way it works is you write a .py script, and include a module-level
"application" callable, which is your WSGI application. It will
execute it from there.

It's horribly insecure, but should help with people testing their WSGI
apps. The docs are scarce.

Attached is the implementation as well as the example app from the PEP.
-------------- next part --------------
#!/usr/bin/env python

def application(environ, start_response):
	 """Simplest possible application object"""
	 status = '200 OK'
	 headers = [('Content-type','text/plain')]
	 write = start_response(status, headers)
	 write('Hello world!\n')
-------------- next part --------------
"""WSGI-savvy HTTP Server.


SECURITY WARNING: DON'T USE THIS CODE UNLESS YOU ARE INSIDE A FIREWALL
-- it may execute arbitrary Python code or external programs.

"""


__version__ = "0.4"

__all__ = ["WSGIHTTPRequestHandler"]

import os
import sys
import urllib
import BaseHTTPServer
import SimpleHTTPServer
import select
import traceback

class WSGIHTTPRequestHandler(SimpleHTTPServer.SimpleHTTPRequestHandler):

    """Complete HTTP server with GET, HEAD and POST commands.

    GET and HEAD also support running CGI scripts.

    The POST command is *only* implemented for CGI scripts.

    """

    # Make rfile unbuffered -- we need to read one line and then pass
    # the rest to a subprocess, so we can't use buffered input.
    rbufsize = 0

    def do_POST(self):
        """Serve a POST request.

        This is only implemented for CGI scripts.

        """

        if self.is_cgi():
            self.run_cgi()
        else:
            self.send_error(501, "Can only POST to CGI scripts")

    def do_GET(self):
        if self.is_cgi():
            self.run_cgi()
        else:
            SimpleHTTPServer.SimpleHTTPRequestHandler.do_GET(self)

    def send_head(self):
        """Version of send_head that support CGI scripts"""
        if self.is_cgi():
            return self.run_cgi()
        else:
            return SimpleHTTPServer.SimpleHTTPRequestHandler.send_head(self)

    def is_cgi(self):
        """Test whether self.path corresponds to a Python script
        """

        path = self.path

        if "?" in path:
            path = path[:path.rfind("?")]

        if self.is_python(path):
            i = path.rfind("/")
            if i == -1:
                self.cgi_info = "/",path
            else:
                self.cgi_info = path[:i],path[i+1:]
            #self.cgi_info = os.path.split(path)#path[:-1], path[-1]
            return True
        else:
            return False

    def is_python(self, path):
        """Test whether argument path is a Python script."""
        head, tail = os.path.splitext(path)
        return tail.lower() in (".py", ".pyw")

    def run_cgi(self):
        """Execute a CGI script."""
        dir, rest = self.cgi_info
        i = rest.rfind('?')
        if i >= 0:
            rest, query = rest[:i], rest[i+1:]
        else:
            query = ''
        i = rest.find('/')
        if i >= 0:
            script, rest = rest[:i], rest[i:]
        else:
            script, rest = rest, ''
            
        scriptname = dir + '/' + script
        scriptfile = self.translate_path(scriptname)
        if not os.path.exists(scriptfile):
            self.send_error(404, "No such CGI script (%s)" % `scriptname`)
            return
        if not os.path.isfile(scriptfile):
            self.send_error(403, "CGI script is not a plain file (%s)" %
                            `scriptname`)
            return
        ispy = self.is_python(scriptname)
        if not ispy:
            self.send_error(403, "CGI script is not a Python script (%s)" %
                            `scriptname`)
            return
        # Reference: http://hoohoo.ncsa.uiuc.edu/cgi/env.html
        # XXX Much of the following could be prepared ahead of time!
        env = {}
        env['SERVER_SOFTWARE'] = self.version_string()
        env['SERVER_NAME'] = self.server.server_name
        env['GATEWAY_INTERFACE'] = 'CGI/1.1'
        env['SERVER_PROTOCOL'] = self.protocol_version
        env['SERVER_PORT'] = str(self.server.server_port)
        env['REQUEST_METHOD'] = self.command
        uqrest = urllib.unquote(rest)
        env['PATH_INFO'] = uqrest
        env['PATH_TRANSLATED'] = self.translate_path(uqrest)
        env['SCRIPT_NAME'] = scriptname
        if query:
            env['QUERY_STRING'] = query
        host = self.address_string()
        if host != self.client_address[0]:
            env['REMOTE_HOST'] = host
        env['REMOTE_ADDR'] = self.client_address[0]
        # XXX AUTH_TYPE
        # XXX REMOTE_USER
        # XXX REMOTE_IDENT
        if self.headers.typeheader is None:
            env['CONTENT_TYPE'] = self.headers.type
        else:
            env['CONTENT_TYPE'] = self.headers.typeheader
        length = self.headers.getheader('content-length')
        if length:
            env['CONTENT_LENGTH'] = length
        accept = []
        for line in self.headers.getallmatchingheaders('accept'):
            if line[:1] in "\t\n\r ":
                accept.append(line.strip())
            else:
                accept = accept + line[7:].split(',')
        env['HTTP_ACCEPT'] = ','.join(accept)
        ua = self.headers.getheader('user-agent')
        if ua:
            env['HTTP_USER_AGENT'] = ua
        co = filter(None, self.headers.getheaders('cookie'))
        if co:
            env['HTTP_COOKIE'] = ', '.join(co)
        # XXX Other HTTP_* headers
        # Since we're setting the env in the parent, provide empty
        # values to override previously set values
        for k in ('QUERY_STRING', 'REMOTE_HOST', 'CONTENT_LENGTH',
                  'HTTP_USER_AGENT', 'HTTP_COOKIE'):
            env.setdefault(k, "")
        env.update(os.environ)

        # now, set WSGI vars
        env['wsgi.input']        = self.rfile
        env['wsgi.errors']       = sys.stderr
        env['wsgi.version']      = '1.0'
        env['wsgi.multithread']  = False
        env['wsgi.multiprocess'] = True
        decoded_query = query.replace('+', ' ')

        try:
            ns = {}
            execfile(scriptfile,ns,ns)
            ns["application"](env, self.start_response)
        except:
            traceback.print_exc(file=sys.stderr)
            self.log_error("WSGI script could not be executed.")
    def start_response(self, status, headers):
        code,desc = status.split(" ",1)
        self.send_response(int(code), desc)
        for k,v in headers:
            self.wfile.write("%s: %s\r\n" % (k,v))
        self.wfile.write("\r\n")
        return self.wfile.write


def test(HandlerClass = WSGIHTTPRequestHandler,
         ServerClass = BaseHTTPServer.HTTPServer):
    SimpleHTTPServer.test(HandlerClass, ServerClass)


if __name__ == '__main__':
    test()
From pje at telecommunity.com  Sat Aug 28 05:13:43 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat Aug 28 05:13:34 2004
Subject: [Web-SIG] Stuff left to be done on WSGI
In-Reply-To: <412FCB12.2030209@colorstudy.com>
References: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>
	<5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040827224215.027473f0@mail.telecommunity.com>

At 07:00 PM 8/27/04 -0500, Ian Bicking wrote:
>Phillip J. Eby wrote:
>>I don't know if it's possible for us to get these items together in time 
>>for 2.4; if we don't, we don't.
>
>I can't imagine we would make it.

You're probably right; it's just so tantalizingly close, as AMK mentioned.


>I would hope that we can come to some consensus and produce something 
>useable before 2.5, with the understanding that it will be included in 
>2.5.  I would kind of like to see a "web" package.

I think we'll have better luck with a 'wsgi' package, but I could be 
wrong.  'web' just seems like a nuisance attractor for all sorts of 
unproductive bickering on so many levels.

On a more immediate practical level, we'd be crazy to try to claim 'web' 
for a third-party package that we want to propose for the stdlib, but a 
package named 'wsgi' would be more than fair game.


>>There's little harm in having a separate 'wsgi' distribution until 2.5 
>>rolls around.  I'm thinking the package should include:
>>  * BaseHTTPServer-based WSGI server
>>  * CGI-based WSGI gateway (run WSGI apps under CGI)
>
>You've noted these are missing error handling.  What kind were you 
>thinking of specifically?
>
>There's exception handling, which seems straight forward.

Well, to be honest, I haven't a clue what one does about errors *after* the 
headers are written.  You can't send anything useful to the client, because 
the status is already set.

If you sent a Content-Length, you can break the connection before that 
point, and it's a fair guess the client will know something's wrong.  If 
you *didn't* send a content length and break the connection, the client 
gets an incomplete file and maybe doesn't know it.  Sending an error 
message once 'write()' has been called will garble the output.

All of these options are especially unsatisfactory when binary files are 
involved, where "unsatisfactory" could mean anything from "annoying" to 
"catastrophic" (e.g. garbling an executable).


>   Spec compliance?  Certainly an anal version of these servers should be 
> written, that checks every type passed around, looks for common mistakes, 
> etc.  I don't know if the anal and the useable version need to be the 
> same thing.

I wasn't even addressing spec compliance, although test suites for all the 
implementations, factored so that they could be used as a basis for testing 
other implementations, would certainly be nice.


>Two models -- one that optimistically tries to load the cgi module in a 
>fake environment (what I did), plus another that actually runs any CGI script.

I'm not following what the difference is, exactly, but I guess we'll need 
to get into the design more.


>If we use email.Message, using a status header seems fine.  If not, I 
>think it should be separate -- I don't want to search a list for the 
>status header.

Right, that's all I was thinking.


>I don't think the utility functions are a big deal at all, and I worry 
>that there's some gotchas to email.Message, specifically where it is 
>intended for email.  So I'm certainly not adamantly opposed to 
>email.Message, but I'm not adamantly for it either.  I'd rather see a 
>superclass of email.Message (such a superclass does not yet exist, but 
>should be easy to write/extract) that is more minimal.

Why don't you take a look at the code?  I have.  Here are the methods:

as_string, __str__ -- format the message as a string

is_multipart -- returns true if payload has been set to a list

get_unixfrom/set_unixfrom, add_payload/set_payload/get_payload/attach, 
get_charsets, walk -- stuff for manipulating parts of the message we don't 
care about.

set_charset/get_charset -- sets the character set parameters of the 
content-type, which is actually useful.  On the down side, setting the 
character set sets MIME-Version, but it also sets the 
Content-Transfer-Encoding, so it doesn't force the server to default one.

__len__, __getitem__, __setitem__, __delitem__, __contains__, has_key, get, 
keys, values, items -- case-insensitive dictionary-like interface (i.e., 
the stuff we mainly want)

get_all -- all values for a header name

add_header, replace_header -- more stuff we want

get_type, get_main_type, get_subtype, get_content_type, 
get_content_maintype, get_content_subtype, get_content_subtype, get_param, 
get_params, set_param, del_param, set_type, get_boundary, set_boundary, 
get_content_charset -- miscellaneous content-type analysis and 
manipulation.  Not necessarily very helpful, except maybe for 
middleware.  But they hardly hurt.

get_filename -- extract filename from Content-Disposition if present.  Not 
particularly helpful, but also not damaging in any way.


Perhaps more eyes should look at this, but I haven't found anything in here 
that's damaging or even annoying apart from setting MIME-Version if it's 
not there and the content-type is touched.


>But, I don't know.  I'm still up in the air.  Really, I just don't like 
>wrapping start_response, from a mechanical point of view.  It feels 
>awkward to me.  I wish I could just query the server as to what point in 
>the response it is at.

Well, we could offer a facility for that, but first I'd like to explore 
what error handling should *do* in different situations.


>>The only other thing that comes to mind is requiring servers to support 
>>multiple 'start_response' calls in some way that makes sense for 
>>exception handlers, while requiring it to still work in the case where an 
>>extension API has already been used for output.
>
>That seems too hard.

Well, to some extent we have to look at the question of what should happen 
in those circumstances anyway, whether we solve the problem in that 
specific way or not.  Because if the application *does* call start_response 
more than once, the server has to be able to handle it *somehow*.  Really, 
the ultimate error handling *has* to be done by servers, unless they want 
to take the route of crashing the entire process when something bad 
happens.  :)

From pje at telecommunity.com  Sat Aug 28 05:18:17 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat Aug 28 05:18:03 2004
Subject: [Web-SIG] Re: Regarding the WSGI draft
In-Reply-To: <412FB107.7020702@colorstudy.com>
References: <5.1.1.6.0.20040827172056.025e6e50@mail.telecommunity.com>
	<5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com>
	<412E2C3E.7000900@kylotan.eidosnet.co.uk>
	<5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com>
	<5.1.1.6.0.20040827172056.025e6e50@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040827231446.021cd010@mail.telecommunity.com>

At 05:09 PM 8/27/04 -0500, Ian Bicking wrote:
>Though it can cause problems.  E.g., if instead of the cgi server passing 
>sys.stdout.write, it passed:
>
>def write(s):
>     sys.stdout.write(s)
>
>That would cause all sorts of problems.  Unless it used 
>sys.__stdout__.write(s); I don't know if that would be a good or bad 
>style.  That's what I did to work around my bug.

There's another way...  make the dummy file object put in for sys.stdout do 
this:

     def write(self,data):
         sys.stdout = self.__oldstdout__
         try:
             self.wsgi_writefunc(data)
         finally:
             sys.stdout = self

Voila.  Now, even if the WSGI server is written to use stdout, it still 
works.  The same trick can and should be used for stdin and stderr.

It's messy, but it should suffice.  Actually, to be a really decent 
emulation, the dummy stdout.write() should probably buffer the data, and 
look for flush() before calling the wsgi_writefunc.  Assuming it's not 
still buffering headers.  But I digress.  Clearly, CGI is a pain in the, 
er... gateway.  :)

From pje at telecommunity.com  Sat Aug 28 05:22:01 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat Aug 28 05:21:49 2004
Subject: [Web-SIG] Stuff left to be done on WSGI
In-Reply-To: <D030AB65-F886-11D8-82BE-000A95BD86C0@mnot.net>
References: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>
	<5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040827231832.030f8e60@mail.telecommunity.com>

At 05:11 PM 8/27/04 -0700, Mark Nottingham wrote:
>I'd be inclined to keep a separation between status and headers, so that 
>one doesn't have to worry about collisions, namespace pollution, etc.

It's only a collision if some future version of HTTP decides to use 
'Status:' as a response header, in which case CGI is in trouble.  :)


>For that matter, my preference would be for environ to be split into 
>(environ, request_method, request_url, request_headers) or similar. 
>However, I know it's late, and I don't want to hold things up.

Don't worry about the lateness.  Let's do it right.

That having been said, I've previously mentioned these reasons for *not* 
doing request headers and suchlike:

1. lots of code in-the-field knows how to do sensible things with CGI 
variables, but not HTTP headers

2. HTTP doesn't differentiate between "target of this request" and "where 
the application is", but CGI does (SCRIPT_NAME + PATH_INFO)

From ianb at colorstudy.com  Sat Aug 28 06:03:28 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Sat Aug 28 06:03:38 2004
Subject: [Web-SIG] Re: Regarding the WSGI draft
In-Reply-To: <5.1.1.6.0.20040827231446.021cd010@mail.telecommunity.com>
References: <5.1.1.6.0.20040827172056.025e6e50@mail.telecommunity.com>
	<5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com>
	<412E2C3E.7000900@kylotan.eidosnet.co.uk>
	<5.1.1.6.0.20040827165033.022967b0@mail.telecommunity.com>
	<5.1.1.6.0.20040827172056.025e6e50@mail.telecommunity.com>
	<5.1.1.6.0.20040827231446.021cd010@mail.telecommunity.com>
Message-ID: <41300410.8090404@colorstudy.com>

Phillip J. Eby wrote:
> At 05:09 PM 8/27/04 -0500, Ian Bicking wrote:
> 
>> Though it can cause problems.  E.g., if instead of the cgi server 
>> passing sys.stdout.write, it passed:
>>
>> def write(s):
>>     sys.stdout.write(s)
>>
>> That would cause all sorts of problems.  Unless it used 
>> sys.__stdout__.write(s); I don't know if that would be a good or bad 
>> style.  That's what I did to work around my bug.
> 
> 
> There's another way...  make the dummy file object put in for sys.stdout 
> do this:
> 
>     def write(self,data):
>         sys.stdout = self.__oldstdout__
>         try:
>             self.wsgi_writefunc(data)
>         finally:
>             sys.stdout = self
> 
> Voila.  Now, even if the WSGI server is written to use stdout, it still 
> works.  The same trick can and should be used for stdin and stderr.

Hmm... possibly.  Another thought I had was to buffer all the output, 
then only return as an iterator (or with a single call to the server's 
write function) when the application has finished.  This way the only 
problem would be with server extensions, as no server code would 
normally be written while the script was running.  Hrm, though that has 
its own problems if the script needs to stream output.  Yours would be 
more general in that case.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From ianb at colorstudy.com  Sat Aug 28 06:51:57 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Sat Aug 28 06:52:03 2004
Subject: [Web-SIG] Stuff left to be done on WSGI
In-Reply-To: <5.1.1.6.0.20040827224215.027473f0@mail.telecommunity.com>
References: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>
	<5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>
	<5.1.1.6.0.20040827224215.027473f0@mail.telecommunity.com>
Message-ID: <41300F6D.4050607@colorstudy.com>

Phillip J. Eby wrote:
>> I would hope that we can come to some consensus and produce something 
>> useable before 2.5, with the understanding that it will be included in 
>> 2.5.  I would kind of like to see a "web" package.
> 
> 
> I think we'll have better luck with a 'wsgi' package, but I could be 
> wrong.  'web' just seems like a nuisance attractor for all sorts of 
> unproductive bickering on so many levels.
> 
> On a more immediate practical level, we'd be crazy to try to claim 'web' 
> for a third-party package that we want to propose for the stdlib, but a 
> package named 'wsgi' would be more than fair game.

I would only want to use "web" if we could get agreement that it would 
be in 2.5 under that name.  I was thinking of it like a package for 
various Python web-related modules (the Next Generation; forgoing this 
current generation which is all in the root).

Almost all the modules in the root have issues.  Well, let's enumerate...

webbrowser: this seems like a totally weird module to me
cgi: ick ick ick.
cgitb: this is okay.
urllib: defunct?
urllib2: surpisingly hard to use in a number of ways.  There was some 
discussion about this early in Web-SIG.  I think the client stuff John 
Lee has done at: http://wwwsearch.sourceforge.net/ is better, and I 
think he's interested in that direction.  Probably not right now, but at 
some point this could well improve on urllib*
httplib: actually okay, kind of; needed for some things that urllib 
can't do.  But it also seems redundant in other ways.
urlparse: like os.path, this is a rather annoying module to use, though 
I guess it works fine.  I'd like to see something like Jason Orendorf's 
path module, but for URLs.
BaseHTTPServer, SimpleHTTPServer, CGIHTTPServer: it seems odd that this 
is three modules.  And none of the three actually claims to work that 
well.  It's wonky.  They're useful modules, but limited in scope.
Cookie: weird interface.  Has some insecure parts.  I think mod_python 
differs mostly in that it has secure alternatives.
xmlrpclib: a good module.
SimpleXMLRPCServer: like the HTTPServers, seems a little odd.
DocXMLRPCServer: what a weird module.
robotparser: never knew this existed.
HTMLParser: lives in the world between web and XML.  Some of the client 
tools in wwwserver are very HTML-centric as well.  But it all fits together.
htmllib: deprecated, I think?  Or HTMLParser?  I don't know what's going 
on here.
htmlentitydefs: another odd little module.


Anyway, I think there's a case to be made for a new generation of web 
libraries, and a package to bring them together.

I don't know if we need deeper hierarchy than that.  E.g., 
web.wsgi.cgiadapter.  I don't think so.  I'd rather "WSGI" be a term 
only those in the know use -- it means nothing unless you expand the 
acronym, and even then it's pretty vague.  Ultimately I hope most web 
programmers just don't need to think about any of it.

>>> There's little harm in having a separate 'wsgi' distribution until 
>>> 2.5 rolls around.  I'm thinking the package should include:
>>>  * BaseHTTPServer-based WSGI server
>>>  * CGI-based WSGI gateway (run WSGI apps under CGI)
>>
>>
>> You've noted these are missing error handling.  What kind were you 
>> thinking of specifically?
>>
>> There's exception handling, which seems straight forward.
> 
> 
> Well, to be honest, I haven't a clue what one does about errors *after* 
> the headers are written.  You can't send anything useful to the client, 
> because the status is already set.
> 
> If you sent a Content-Length, you can break the connection before that 
> point, and it's a fair guess the client will know something's wrong.  If 
> you *didn't* send a content length and break the connection, the client 
> gets an incomplete file and maybe doesn't know it.  Sending an error 
> message once 'write()' has been called will garble the output.
 >
> All of these options are especially unsatisfactory when binary files are 
> involved, where "unsatisfactory" could mean anything from "annoying" to 
> "catastrophic" (e.g. garbling an executable).

Yes, you are right.  Which means the catcher has to keep track of the 
headers that were sent if it hopes to do anything.  In that case, it 
might check for text/html or text/plain; if not those two, then just 
stop the response short and log the error.  If so, and if configured to 
show errors, then it could display them; cgitb goes to some length to 
make HTML render correctly.

That makes me think that wrapping send_response is more reasonable. 
Though it makes error resolution in servers more complex.

>>   Spec compliance?  Certainly an anal version of these servers should 
>> be written, that checks every type passed around, looks for common 
>> mistakes, etc.  I don't know if the anal and the useable version need 
>> to be the same thing.
> 
> 
> I wasn't even addressing spec compliance, although test suites for all 
> the implementations, factored so that they could be used as a basis for 
> testing other implementations, would certainly be nice.

Yes, I've meant to work on this.  I have a simple "echo" application 
that sends results based on the query; throwing errors, displaying text, 
displaying the environ, etc.  I was thinking that along with a client 
could make a good structure for further testing.  Then the echo 
application could be coded in different styles of application as well -- 
for instance, jonpy, and the same tests run.  It would be useful for 
testing middleware as well.  I'll try to give it a go sometime soon.

>> Two models -- one that optimistically tries to load the cgi module in 
>> a fake environment (what I did), plus another that actually runs any 
>> CGI script.
> 
> I'm not following what the difference is, exactly, but I guess we'll 
> need to get into the design more.

One runner would actually fork a process and run the CGI script 
separately.  This would be useful for, say, implementing CGIHTTPServer 
in terms of WSGI.  It would always work, because it would actually run 
the script as a CGI script.

>> I don't think the utility functions are a big deal at all, and I worry 
>> that there's some gotchas to email.Message, specifically where it is 
>> intended for email.  So I'm certainly not adamantly opposed to 
>> email.Message, but I'm not adamantly for it either.  I'd rather see a 
>> superclass of email.Message (such a superclass does not yet exist, but 
>> should be easy to write/extract) that is more minimal.
> 
> 
> Why don't you take a look at the code?  I have. 

Well good, now I don't need to ;)

> Here are the methods:
> 
> as_string, __str__ -- format the message as a string
> 
> is_multipart -- returns true if payload has been set to a list

Can you do this with HTTP?  I know some MIME stuff works (like 
content-disposition: attachment; filename=blah).  Would this work too? 
In a meaningful way?  The cgi module has some weird MIME stuff in it 
that I don't think any web client has ever exercised.

> get_unixfrom/set_unixfrom, add_payload/set_payload/get_payload/attach, 
> get_charsets, walk -- stuff for manipulating parts of the message we 
> don't care about.

Yes.  If these accidentally are used, will it effect the as_string 
representation?

> set_charset/get_charset -- sets the character set parameters of the 
> content-type, which is actually useful.  On the down side, setting the 
> character set sets MIME-Version, but it also sets the 
> Content-Transfer-Encoding, so it doesn't force the server to default one.

Would that start opening up the possibility of accepting Unicode to 
write()/app_iter?

> __len__, __getitem__, __setitem__, __delitem__, __contains__, has_key, 
> get, keys, values, items -- case-insensitive dictionary-like interface 
> (i.e., the stuff we mainly want)
> 
> get_all -- all values for a header name
> 
> add_header, replace_header -- more stuff we want

Very good, though not hard to reimplement.

> get_type, get_main_type, get_subtype, get_content_type, 
> get_content_maintype, get_content_subtype, get_content_subtype, 
> get_param, get_params, set_param, del_param, set_type, get_boundary, 
> set_boundary, get_content_charset -- miscellaneous content-type analysis 
> and manipulation.  Not necessarily very helpful, except maybe for 
> middleware.  But they hardly hurt.
> 
> get_filename -- extract filename from Content-Disposition if present.  
> Not particularly helpful, but also not damaging in any way.

Sure.

> 
> Perhaps more eyes should look at this, but I haven't found anything in 
> here that's damaging or even annoying apart from setting MIME-Version if 
> it's not there and the content-type is touched.

Okay, looking through the code briefly, I can't help but think that all 
the complex parts are parts we don't care about.  A case-insensitive 
dictionary that accepts multiple values for a key isn't hard to 
implement.  Certainly we could match the interface of email.Message 
where it applies.  If it ended up in the standard library, that's fine 
-- it's one of those things people keep reinventing anyway, so a 
canonical implementation would be good.

>>> The only other thing that comes to mind is requiring servers to 
>>> support multiple 'start_response' calls in some way that makes sense 
>>> for exception handlers, while requiring it to still work in the case 
>>> where an extension API has already been used for output.
>>
>>
>> That seems too hard.
> 
> 
> Well, to some extent we have to look at the question of what should 
> happen in those circumstances anyway, whether we solve the problem in 
> that specific way or not.  Because if the application *does* call 
> start_response more than once, the server has to be able to handle it 
> *somehow*.  Really, the ultimate error handling *has* to be done by 
> servers, unless they want to take the route of crashing the entire 
> process when something bad happens.  :)

Good question.  I think servers should consider that an error, but they 
should handle that error gracefully.  Which probably means keeping a 
"has send_response already been called" flag.

Now, if I could get access to that flag from middleware... and maybe 
access to the headers and status that have already been sent... (and 
really, why not?  We aren't worried about streaming headers like we are 
about bodies)

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From pje at telecommunity.com  Sat Aug 28 18:56:35 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat Aug 28 18:56:42 2004
Subject: [Web-SIG] Stuff left to be done on WSGI
In-Reply-To: <41300F6D.4050607@colorstudy.com>
References: <5.1.1.6.0.20040827224215.027473f0@mail.telecommunity.com>
	<5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>
	<5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>
	<5.1.1.6.0.20040827224215.027473f0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040828121031.03326890@mail.telecommunity.com>

At 11:51 PM 8/27/04 -0500, Ian Bicking wrote:
>I don't know if we need deeper hierarchy than that.  E.g., 
>web.wsgi.cgiadapter.  I don't think so.  I'd rather "WSGI" be a term only 
>those in the know use -- it means nothing unless you expand the acronym, 
>and even then it's pretty vague.  Ultimately I hope most web programmers 
>just don't need to think about any of it.

Flat is better than nested; let's not mix other projects into this.  The 
WSGI stuff will have enough content to deserve a package of its own, and we 
don't want it to be dependent upon a bunch of "next generation" stuff 
that's not even designed yet.

>Yes, you are right.  Which means the catcher has to keep track of the 
>headers that were sent if it hopes to do anything.  In that case, it might 
>check for text/html or text/plain; if not those two, then just stop the 
>response short and log the error.  If so, and if configured to show 
>errors, then it could display them; cgitb goes to some length to make HTML 
>render correctly.
>
>That makes me think that wrapping send_response is more reasonable. Though 
>it makes error resolution in servers more complex.

I'm not sure I follow you.  The error handling in the server would look 
just like the handling in middleware, no?  In fact, this potentially sounds 
like a job for another boilerplate function in wsgi.util, or perhaps a 
class.  I imagine we might have an AbstractWSGIServer that defines basic 
start-response, write, and other operations, with abstract methods for 
sending/receiving data to and from the client, and various overrideable 
methods for policy.  The simple WSGIServer and CGI gateway would both 
derive from it, or perhaps delegate to it.


>>Here are the methods:
>>as_string, __str__ -- format the message as a string
>>is_multipart -- returns true if payload has been set to a list
>
>Can you do this with HTTP?  I know some MIME stuff works (like 
>content-disposition: attachment; filename=blah).  Would this work too? In 
>a meaningful way?  The cgi module has some weird MIME stuff in it that I 
>don't think any web client has ever exercised.

The as_string/__str__ aren't really useful for HTTP, because they include 
the payload, and optionally a "unix from" line.  They'd only be useful in 
debugging, just to dump out some info.


>>get_unixfrom/set_unixfrom, add_payload/set_payload/get_payload/attach, 
>>get_charsets, walk -- stuff for manipulating parts of the message we 
>>don't care about.
>
>Yes.  If these accidentally are used, will it effect the as_string 
>representation?

Yes, which is why we don't need/care about them.


>>set_charset/get_charset -- sets the character set parameters of the 
>>content-type, which is actually useful.  On the down side, setting the 
>>character set sets MIME-Version, but it also sets the 
>>Content-Transfer-Encoding, so it doesn't force the server to default one.
>
>Would that start opening up the possibility of accepting Unicode to 
>write()/app_iter?

In my view, no, because then we'd force the server to know about every 
possible encoding the client and app can come up with.  If the app uses 
this, it should handle the encoding.  We might want to include a utility 
routine or two to pull what the client accepts out of HTTP_ACCEPT et al.


>>__len__, __getitem__, __setitem__, __delitem__, __contains__, has_key, 
>>get, keys, values, items -- case-insensitive dictionary-like interface 
>>(i.e., the stuff we mainly want)
>>get_all -- all values for a header name
>>add_header, replace_header -- more stuff we want
>
>Very good, though not hard to reimplement.

But why should everybody reimplement it, if we're not going to be in the 
stdlib till 2005?


>Okay, looking through the code briefly, I can't help but think that all 
>the complex parts are parts we don't care about.

Not so; content-type parameter setting is quite handy.  For example, if 
you're doing multipart push, you'll need e.g. set_boundary and get_boundary 
might also be useful.


>>Well, to some extent we have to look at the question of what should 
>>happen in those circumstances anyway, whether we solve the problem in 
>>that specific way or not.  Because if the application *does* call 
>>start_response more than once, the server has to be able to handle it 
>>*somehow*.  Really, the ultimate error handling *has* to be done by 
>>servers, unless they want to take the route of crashing the entire 
>>process when something bad happens.  :)
>
>Good question.  I think servers should consider that an error, but they 
>should handle that error gracefully.  Which probably means keeping a "has 
>send_response already been called" flag.
>
>Now, if I could get access to that flag from middleware... and maybe 
>access to the headers and status that have already been sent... (and 
>really, why not?  We aren't worried about streaming headers like we are 
>about bodies)

You dodged my question...  what are you going to *do* with that?  Because 
we need to formulate sensible error handling policies for the general case, 
including things like an I/O error due to the client disconnecting.

Here are possible loci of error:

    * Before start_response is called (application error)
    * During start_response (server error or application error
    * After start_response, before first write  (application error)
    * During a write (server error or application error)
    * Between writes, before return (application error)
    * After return/during iteration (application error)
    * During a post-return write (server error or application error)
    * During 'close()' (application error)

The reason those are "server or application" is because start_response and 
write can fail due to bad data passed by the application, so it's really an 
application error in that case.  The server might fail for some other 
reason, of course, like a lost client connection.

One issue here is that an application or middleware error handler needs to 
know whether the error is the application's or the server's.  It makes no 
sense for a failed write to cause a middleware error handler to attempt to 
write some more data!  It seems we need an error parameter like:

    environ['wsgi.fatal_errors'] = SomeExceptionClass1, SomeExceptionClass2,...

Such that one would use:

    try:
        # invoke child application, etc.
    except environ['wsgi.fatal_errors']:
        raise
    except:
        # regular error handling here

In other words, an application or middleware component should abort if it 
receives one of these exception types.  I'm inclined to think that 
application WSGI programming errors should be treated as fatal: if the app 
sends bad parameters to start_response or write, there's little point in 
proceeding further.

From pje at telecommunity.com  Sat Aug 28 05:33:04 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat Aug 28 18:58:35 2004
Subject: [Web-SIG] Stuff left to be done on WSGI
In-Reply-To: <BFE4BFC5-F88C-11D8-82BE-000A95BD86C0@mnot.net>
References: <412FD033.6060607@colorstudy.com>
	<5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>
	<D030AB65-F886-11D8-82BE-000A95BD86C0@mnot.net>
	<412FD033.6060607@colorstudy.com>
Message-ID: <5.1.1.6.0.20040827232241.030fa080@mail.telecommunity.com>

At 05:54 PM 8/27/04 -0700, Mark Nottingham wrote:
>Could you expand on the problems that would be encountered with pass-through?
>
>I don't think it would add redundancy in the CGI case, it would just 
>require CGI WSGI servers to remove http headers from the environment and 
>put them in the proper data structure.
>
>WRT URIs, my preference (once again, just stating what I'd do, not saying 
>that I think this MUST change) would be to base it on the underlying 
>specs, and not make it so CGI-centric; i.e., have 'abs_path' and 'query' 
>(these are the BNF productions in both 2396 and 2616), and that's it; 
>anything else (e.g., script location) would be in the environment, and 
>probably server-specific.

And now every framework that's already based on parsing CGI variables is 
stuck having to write code to turn all that stuff (including your 
"server-specific" application location) into CGI variables.  *And* we get 
to write something to take the CGI variables and turn them into this other 
format, and throw away the script location so the script can try to figure 
it back out again.

Why reinvent the wheel, when CGI has already shown itself to be of 
practical use for this?

From jjl at pobox.com  Sat Aug 28 19:18:57 2004
From: jjl at pobox.com (John J Lee)
Date: Sat Aug 28 19:19:01 2004
Subject: [Web-SIG] Stuff left to be done on WSGI
In-Reply-To: <41300F6D.4050607@colorstudy.com>
References: <5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>
	<5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>
	<5.1.1.6.0.20040827224215.027473f0@mail.telecommunity.com>
	<41300F6D.4050607@colorstudy.com>
Message-ID: <Pine.LNX.4.58.0408281809520.1136@alice>

On Fri, 27 Aug 2004, Ian Bicking wrote:
> Phillip J. Eby wrote:
[...]
> > wrong.  'web' just seems like a nuisance attractor for all sorts of
> > unproductive bickering on so many levels.
[...]
> be in 2.5 under that name.  I was thinking of it like a package for
> various Python web-related modules (the Next Generation; forgoing this
> current generation which is all in the root).

+0 for web as bag-of-modules

Seems uncontroversial, since anybody with a web module has an equal right
to lay claim to a patch of land within it.

OTOH, I've never found the Python practise of sticking all stdlib modules
in the root namespace to be troublesome.  And the reality is that there is
no grand scheme here: people generally do small pieces of work as they
find they need / want to do it.


> Almost all the modules in the root have issues.  Well, let's enumerate...
>
> webbrowser: this seems like a totally weird module to me

Why?


> cgi: ick ick ick.
> cgitb: this is okay.
> urllib: defunct?

It's not about to go away. (especially since Guido wrote it, I think ;-)

Unfortunately, I think there enough bugs in both urllib and urllib2 that
it's hard to say that either is unconditionally better for all purposes.


> urllib2: surpisingly hard to use in a number of ways.  There was some
> discussion about this early in Web-SIG.  I think the client stuff John
> Lee has done at: http://wwwsearch.sourceforge.net/ is better, and I
> think he's interested in that direction.  Probably not right now, but at
> some point this could well improve on urllib*

This is what I hope to do on urllib2 for 2.5, very roughly in order of
priority.  I guess you're referring above mostly to 3 in this list.  1, 2
and 3 will likely happen, 4, 5, and 6 may or may not.

Help is welcome :-)

1 Add more handlers from ClientCookie: Robot rules, http-equiv, refresh, etc.

2 Add features that are present in urllib but missing from urllib2
  (urlretrieve is the most obvious, and easy to fix).

3 A class bearing some resemblance to mechanize.UserAgent, as we discussed
  here before.  The idea is to avoid having to make a new object each time
  you want to change URL-opener behaviour.

4 Possibly improve proxy, authentication support, if I can be bothered.  I
  think this is probably still quite buggy, despite valuable changes from
  Anthony Baxter and others.

5 Connection caching.

6 HEAD, GET byte range (and maybe something to make resuming downloads as
  easy as possible), conditional GET requests, a function to do file
  uploads.

[...]
> DocXMLRPCServer: what a weird module.

Weird indeed.  Never noticed it before.

[...]
> HTMLParser: lives in the world between web and XML.  Some of the client
> tools in wwwserver are very HTML-centric as well.  But it all fits together.
> htmllib: deprecated, I think?  Or HTMLParser?  I don't know what's going
> on here.

As you probably know, htmllib just adds some possibly-convenient bits and
pieces on top of sgmllib.

sgmllib/htmllib is more relaxed about bad HTML than is HTMLParser, so is
certainly worth keeping.


John
From py-web-sig at xhaus.com  Mon Aug 30 02:32:23 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Mon Aug 30 02:54:40 2004
Subject: [Web-SIG] My experiences implement WSGI on java/j2ee/jython.
Message-ID: <41327597.5060909@xhaus.com>

Dear Web-Sig,

Firstly, I must say, I am totally impressed with the WSGI initiative. At 
first at wasn't clear how such low level structures could improve the 
fragmented situation with python-web frameworks. But now that I've spend 
some time implementing a framework that complies with the spec, I 
understand it a *lot* better, and can see a lot of it's benefits.

Secondly, I must apologise in advance for the length of this post :-)

I decided to write a java/j2ee/jython framework which layers WSGI on top 
of java servlets. I decided this for a number of reasons

  - Because I want WSGI to succeed, and in open-source chances of 
success are greatly enhanced by running code.
  - Because jython needs to be included in WSGI from the ground up.
  - Because cpython and jython should be able to share web components.
  - Because WSGI needs testing against as many server architectures as 
possible.
  - Because the best way to test the quality and usability of a spec is 
to write software that implements it.
  - Because I pray for the day when we can pick and mix capabilities 
from the huge wealth of python web frameworks out there.
  - Because J2EE (i.e. traditional servlets) are sometimes far too 
restrictive, in terms of the way they handle cookies, authorisation, 
etc, and require configuring lots of XML files, which can be a pain: I 
don't like coding in XML, I like coding in python, where I can keep my 
configuration all in an appropriate format.
  - Because I want cpythonistas to keep jython in mind.
  - Because someone had to do it :-), and I do J2EE and jython stuff all 
the time in my work
  - Because WSGI was small enough to implement in a day or two.
  - A load of other good reasons.

My code is not ready for release. I only spent yesterday writing it: 
it's not big, approx 500 lines of java. But I haven't even compiled it 
yet, so it's got loads of syntax errors, no comments, no documentation, 
etc. I expect the compilation and debugging to take a day or two. 
However, I'm ridiculously busy at the moment, and really can't spare 
much time. The fact that I sacrificed my weekend to get jython WSGI 
up-and-running quickly may give you an idea of how important I consider 
the WSGI initiative. I promise I'll release my code by next weekend, 
whatever state it's in. If it's not 100% running, it'll be 90+% running, 
at least.

My design for the moment is really just to show a proof of concept, and 
a bare-bones framework. The framework will simply allow, through 
configuration, the user to map an URL python file, and to specify the 
name of a callalble object within that file, which will obviously be the 
application. Application objects will be cached, based on the filename 
they came from. The request will be dispatched to the application in a 
WSGI compliant way. Simple. For the moment, I'm taking the easy way out, 
in relation to things like threading guarantees. Anything that asks to 
be single-threaded will still use a single instance, but calls from 
multiple threads will be synchronized on that single object, which 
wouldn't really work in a production framework. As WSGI evolves, I'll 
make these kinds of facilities more robust, scalable.

I don't see the point yet in trying to build any more facilities into my 
framework, e.g. url->object mapping, session management, page-template 
management components, authorization, etc. Hopefully, all of these 
facilities will become available as WSGI middleware components, written 
in nice python: not java, or nasty apache conf files, or servlet 
container XML files, blah, blah, blah.

Anyway, while was writing my thing (with printed WSGI spec in hand, 
covered in annotations, tick marks and red ink :-), I came across a few 
points in the spec that I'd like to raise about things that are either 
observations, or things that are incompletely specified, or that induce 
me to misunderstand, or seem just right or wrong.

Also, I've spent today catching up with the web-sig archives, to review 
everyone's comments (now that I'm in a position to understand them), and 
to make sure that I'm not trolling over old ground. So I've added one or 
two points of my own, based on reading those archives. Hopefully some of 
them will be useful.

Lastly, does have anyone have any name suggestions for a 
java/j2ee/jython WSGI-compliant framework? I've been think along the 
lines of "modjy", but I'm open to better ones :-)

So on to the points/questions.

0. On choice of CGI as a basis.
===============================

My experience with J2EE has clearly demonstrated to me that CGI is the 
right choice to base WSGI upon. The J2EE servlet spec has a specific 
method to return every single CGI variable: the specs even mention "this 
method returns the same as the CGI varibale "SCRIPT_NAME", etc. My job 
as "translator" couldn't have been easier. I expect that many other 
containers/frameworks will also support the CGI spec in this way.

1. Default values of environment variables when not present.
============================================================

The spec says that compulsory environment variables, for example 
"CONTENT_LENGTH" or "CONTENT_TYPE", must have a value, i.e. "must be 
present, but may be an empty string, if there is no more appropriate 
value for them". I read "empty string" to mean "".

There are obviously two different choices for how to represent values 
for headers/env-vars that are not present in the request, i.e. 1. an 
empty string as described above or 2. as a python None value. It seems 
more correct to me to use the latter option, None, for when the 
header/env-var is not available, i.e. the client did not send it. This 
allows the use of the "" value to indicate (the admittedly rare and 
malformed case) that the client sent the header name, but did not 
specify a header value. If WSGI uses the empty string for both cases, 
then we lose the ability to distinguish between when the header was sent 
with no value, and when it wasn't sent at all .

I don't think it's a big deal losing that ability, but I could imagine 
that there might be, for example, some security application that might 
like to have access to that information.

For simplicity of the spec, and robustness of servers/apps running on 
WSGI, I understand why it is a good thing to make the default values as 
robust as possible, i.e. in case some app author tries to use a header 
value without checking if it is None first.

I suppose I'm really pointing out a possible wording difficulty in the 
spec, which says "may be an empty string, if there is no more 
appropriate value". To me None is "a more appropriate value" sometimes, 
so I suppose I could legitimately interpret that to mean that I can use 
None values in my WSGI-compliant framework, because my server 
infrastructure allows me to detect their absence or lack of value.

So perhaps either the wording of the spec needs to be tightened up to 
exclude this? Or the default environment values need to be more clearly 
specified? Or perhaps a discussion of None vs. empty string needs to 
added to the Q&A at the end?

2. The SCRIPT_NAME variable.
============================

At first I was a little wary of the SCRIPT_NAME variable, and how I 
would construct it, until I realised that the beginning of the 
URL->Callable mapping is outside the scope of WSGI: it is in the control 
of whichever program/process/container is receiving HTTP requests 
through sockets from the client, and resolving/dispatching them 
according to its configuration files: in my case that was a J2EE 
container, e.g. Tomcat.

The J2EE call that returns a value equivalent to the CGI SCRIPT_NAME 
variable is HTTPServletRequest.getServletPath method. It is an 
interesting note on it which says that "This method will return an empty 
string ("") if the servlet used to process this request was matched 
using the "/*" pattern." Which seems a little odd, until you realise 
that the SCRIPT_NAME = "" case is when the application object is 
responsible for dealing with the entire URL space. Maybe it's worth 
adding a note to this effect in the WSGI spec as well? It helped me 
understand things better.

An idea occurs to me for a nice little reusable WSGI middleware 
component which is a URI mapper, with functionality akin to apache 
mod_rewrite, resolving URIs to python callable's. A lot of frameworks 
like to do things with URL rewriting and mapping, in order to present a 
nice clean URL interface to a tree of objects. Quixote is one such 
framework that likes to have crisp URLs. But much of the time installing 
such frameworks requires configuring apache and invoking mod_rewrite and 
its "cool voodoo" to get the job done. Which can be difficult to debug 
and get working, and scares newbies. (On re-reading the spec, and the 
mailing list, I see I'm not the only one to have thought of such a uri 
mapping component :-)

If I wrote such a reusable mapping component, I could then simply 
configure my entire "container", e.g. Apache, Tomcat, etc, etc, to 
simply resolve all requests for a URL hierarchy to my python component, 
and nice-n-easy python code takes care of it from there, no mod_rewrite 
rules, no complex java servlets mapping algorithm: just python. A big 
win in terms of both installation simplicity and portability, since that 
standard component could then be used across all WSGI frameworks and the 
containers in which they live. I like this WSGI idea :-)

3. Status code and message.
===========================

The WSGI spec states that the status value passed to start_response 
should be of the form "999 Message here". That's fine, I can parse up 
the string easily enough to get the java data types I need to send to 
the container. However, J2EE does not allow me to set the message 
string: I can only set the status code, and that must have an integer 
value.

So, in terms of compliance with WSGI, am I in violation of the WSGI spec 
by not transmitting the actual textual status message specified by the 
application? If that's a problem, there's nothing I can do about it.

I wonder how often this will be the case with other server/container 
frameworks?

4. Binary vs. textual writing.
==============================

Normally, python opens a file in text mode, line-ending translation 
takes place on all python strings written to the file, changing '\n' to 
whatever is the appropriate local line-ending. This is not noticeable on 
*nix, since *nix uses the same line-ending character as python, '\n', so 
no translation is necessary. This means that people running python on 
*nix can write binary data through channels opened in text mode. On 
other platforms though, namely Windows and MacOS, different line-endings 
are used, and python's '\n' gets translated to '\r\n' and '\r' 
respectively. Which corrupts binary files, e.g. .jpg, .gif, if they 
contain '\n'. So Windows and MacOS python users must open files 
explicitly in binary mode if they want to avoid this translation.

It is fundamental requirement (to me at least) that WSGI be able to 
handle writing of binary data. And I'm fairly sure the intention for the 
write() callable in WSGI is that it take python "strings", which 
includes strings of binary data. But perhaps it needs to made explicitly 
clear in the WSGI spec that the write() callable explicitly writes in 
binary mode, i.e. that no translation is taking place on byte strings 
passed to it, and the application/user is responsible for all encoding 
concerns relating to byte strings passed to the write() callable.

5A. Python 2.1 vs. python 2.2: iterators and generators.
========================================================

The WSGI spec says that python 2.2 features are required to be 
compliant. However, it appears to me that the only python 2.2 features 
in use are iterators and generators, used when the application object 
returns an iterator. In fact, it's just that the example in the WSGI 
spec uses a generator (and its corresponding 'yield' keyword): actual 
applications are not required to use a generator: they can also return 
an object that implements the iterator protocol. Which means returning 
an object with a .next() method when the .__iter__() method is called. 
The iterator.next() method keeps returning values, until the iterator 
runs out, in which case it raises StopIteration. Like generators, the 
iterator protocol was also introduced in python 2.2, but they are two 
separate things.

However, even though jython is based on python 2.1, and thus doesn't 
have built-in support for either iterators or generators, I have still 
implemented the iterator protocol in my java/jython framework, by simply 
invoking the .__iter__() and .next() methods on application objects, and 
catching StopIteration exceptions. So I can support components and 
applications returning iterators, and I'm thus compliant with the spec, 
even though I'm running on 2.1. (This is only possible because I'm 
embedding: it is still not possible to support the iterator protocol in, 
say, jython for-loops)

Does the spec need to be changed to reflect this iterators/versioning 
issue? Or to more clearly define the difference between iterators and 
generators?

It's conceivable that even a python 1.5 framework could be programmed to 
support the iterator protocol: it's *very* easy to implement.

5B. A "python.version" WSGI variable?
=====================================

Of course, it will be case that some middleware and applications will 
require to use more advanced and recent (2.2, 2.3, 2.4) language 
features, such as generators, generator expressions, decorators, etc. 
But such components and applications will not be usable under jython, 
which is 2.1. It would be nice for components and applications to have a 
way of knowing what version of python they are running under. Similarly, 
there will jython components and applications that require java 
libraries, and thus won't be usable on cpython of any version.

Would it be useful to define a WSGI variable "python.version", similar 
to "wsgi.version", which gives the python version in effect? In most 
cases under jython, it wouldn't help, because its 2.1 compiler would 
choke when loading python files with newer python syntax anyway, giving 
syntax errors. But it might be useful in some circumstances, perhaps for 
sophisticated dispatchers with the requisite meta-data available to 
them? I'm not sure on this one. Maybe the values of sys.platform and 
os.name give enough information to deal with this problem?

6. Streaming and flushing.
==========================

I see there has been discussion on the list about streaming output and 
flushing. In one message, Philip said "I'm suggesting that write() 
should be guaranteed to either:

    1) Flush all output before returning, or
    2) Put data in a buffer that will be emptied by another thread or by 
the
operating system

To be a conforming implementation, a server/gateway must do one or the
other."

In the J2EE case (and I'm sure with Apache CGI), that's very simple to 
deal with, since the container will do it's own buffering completely 
outside your control, and send the pieces with chunked-transfer encoding 
if necessary. So even if I put a flush on the output channel in my 
framework, I'm only flushing it to the container's buffer: it's still 
not guaranteed to send output back down the return socket to the client.

Just a datapoint.

7. Redirects.
=============

I read some discussion in the lists on how to handle container specific 
facilities, e.g. Apache/mod_python's ability to internally redirect a 
request.

J2EE offers the same capabilities, to internally redirect a request, 
without sending a response back to the client. It happens in a slightly 
different way, because you first ask your container for a dispatcher, 
based on a url, and then call that dispatcher to redirect to the URL. 
And the client may not see any redirect HTTP responses: it's all 
internal to the container.

I see the solution to this redirect platform-dependence problem in the 
implementation of a platform-independent WSGI middleware component that 
takes all responsiblity for redirects. This component examines the 
wsgi.environment present, seeking hints for the optimal way to redirect 
the request: if mod_python is available, use the mopd_python API call: 
if modjy is available, use the getDispatcher(uri).redirect() dance, etc. 
If none of these platform specific techniques are available, it can fall 
back to sending a 302 or 307 response back to the client, and let the 
client re-reqeust the new URL.

If the platform specific techniques are available, their availability 
will be signalled in wsgi.envvars by the presence of variables such 
"mod_python.request" or "modjy.servlet_context", etc. So one 
ultraportable component could do it all (albeit chock full of special 
cases).

Problem solved?

8. Write callable and fileno()
==============================

It is a good idea to check for the fileno() attribute on the write 
callable, since many platforms/frameworks have high-performance ways of 
transferring file contents to sockets, for example. Java 1.4 nio has 
this capability, through the use of directBuffers, memory-mapped files, 
and special natively implemented methods to transfer between the two. 
I'm be surprised if containers like Apache don't support something 
similar. This can drastically improve throughput on static files.

Java objects have "channel"s, or "outputStream"s not "fileno"s. But 
that's an easy problem to fix.

9. Server-detected headers.
===========================

I can see the reason for servers/containers intercepting client headers 
and translating/augmenting/deleting them. However, do we need a 
specification of what to do with certained specified headers? As with 
CGI, should I recognise the "Status: " header or the "Location: " 
header, and translate it to the relevant status code, or do a redirect, 
respectively? If I don't do those translations, won't I be breaking 
reams of python CGI code out there that relies on Apache doing this?

10. The "wsgi.errors" environment variable.
==========================================

Under J2EE, setting the "wsgi.input" variable is easy, I just wrap the 
HttpServletRequest.getInputStream() with an org.python.core.PyFile, and 
bingo.

However, the J2EE HttpServletRequest has no corresponding error stream, 
nor does the corresponding HttpServletResponse paired with each request. 
The only mechanism I can use to send error output is the "sendError(int, 
message)" method of HttpServletResponse. Which allows me to send both an 
integer status code and a textual message, which the J2EE docs say "The 
server defaults to creating the response to look like an HTML-formatted 
server error page containing the specified message, setting the content 
type to "text/html", leaving cookies and other headers unmodified".

So I can't send error output this way without also knowing a status code 
for it as well.

Which makes we wonder what the "wsgi.errors" variable is for? Yes, it's 
for writing error data. But what do we expect to happen to data that 
gets written to it? Will be it wrapped or translated in some way, and 
and used to construct an error response to the user? Or should it be 
locally logged by the server?

I know that this is all J2EE specific stuff, as is confirmed by the rest 
of the documentation sentence I quoted above: "If an error-page 
declaration has been made for the web application corresponding to the 
status code passed in [to the sendError method], it will be served back 
in preference to the suggested msg parameter." WSGI (rightly) has no 
concept of "configured error page declarations", so it would seem the 
"sendError" method is not the right method to use to implement 
"wsgi.errors".

So I'm going to have to treat the error output in some other way, which 
means I need to know more about what it is. Before I can implement a 
jython framework that is fully compliant with the WSGI spec, I need to 
know what will happen to any output send to "wsgi.errors", so that I can 
code for whatever eventualities arise.

Or if it's always to be a framework specific thing, maybe I'll just 
redirect all "wsgi.errors" output to /dev/null, for example? The J2EE 
ServletContext for each servlet has a "log(message)" method. Maybe I 
should just send error output there, in which case it will end in the 
server logs?

That's all for now.

onwards-and-upwards-ly y'rs,

Alan.


From ianb at colorstudy.com  Mon Aug 30 04:22:25 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon Aug 30 04:22:32 2004
Subject: [Web-SIG] My experiences implement WSGI on java/j2ee/jython.
In-Reply-To: <41327597.5060909@xhaus.com>
References: <41327597.5060909@xhaus.com>
Message-ID: <41328F61.20705@colorstudy.com>

Great to see more implementation.  My thoughts on some of the questions 
(only quoting the relevant portions)...

Alan Kennedy wrote:
> 1. Default values of environment variables when not present.
> ============================================================
> 
> The spec says that compulsory environment variables, for example 
> "CONTENT_LENGTH" or "CONTENT_TYPE", must have a value, i.e. "must be 
> present, but may be an empty string, if there is no more appropriate 
> value for them". I read "empty string" to mean "".
> 
> There are obviously two different choices for how to represent values 
> for headers/env-vars that are not present in the request, i.e. 1. an 
> empty string as described above or 2. as a python None value. It seems 
> more correct to me to use the latter option, None, for when the 
> header/env-var is not available, i.e. the client did not send it. This 
> allows the use of the "" value to indicate (the admittedly rare and 
> malformed case) that the client sent the header name, but did not 
> specify a header value. If WSGI uses the empty string for both cases, 
> then we lose the ability to distinguish between when the header was sent 
> with no value, and when it wasn't sent at all .

Elsewhere in the spec (I forget where) I believe it is very strict that 
all CGI variables (if present) must have non-unicode string values.  So 
None would not be allowed in any CGI variable (only extension 
variables).  I think for all the required variables using the empty 
string should be sufficient to indicate ambiguity.  Applications can't 
depend on there being a good distinction between a missing key and a 
empty string, as different parent containers can go either way, so the 
WSGI gateway might not have any information to work on.

> 2. The SCRIPT_NAME variable.
> ============================
> 
> At first I was a little wary of the SCRIPT_NAME variable, and how I 
> would construct it, until I realised that the beginning of the 
> URL->Callable mapping is outside the scope of WSGI: it is in the control 
> of whichever program/process/container is receiving HTTP requests 
> through sockets from the client, and resolving/dispatching them 
> according to its configuration files: in my case that was a J2EE 
> container, e.g. Tomcat.
> 
> The J2EE call that returns a value equivalent to the CGI SCRIPT_NAME 
> variable is HTTPServletRequest.getServletPath method. It is an 
> interesting note on it which says that "This method will return an empty 
> string ("") if the servlet used to process this request was matched 
> using the "/*" pattern." Which seems a little odd, until you realise 
> that the SCRIPT_NAME = "" case is when the application object is 
> responsible for dealing with the entire URL space. Maybe it's worth 
> adding a note to this effect in the WSGI spec as well? It helped me 
> understand things better.

That makes sense to me.  I don't think SCRIPT_NAME should ever be "/" -- 
usually PATH_INFO should either be the empty string, or start with /, so 
if your application applies to the root domain then PATH_INFO should be 
the entire request URL, and SCRIPT_NAME the empty string.

> An idea occurs to me for a nice little reusable WSGI middleware 
> component which is a URI mapper, with functionality akin to apache 
> mod_rewrite, resolving URIs to python callable's. A lot of frameworks 
> like to do things with URL rewriting and mapping, in order to present a 
> nice clean URL interface to a tree of objects. Quixote is one such 
> framework that likes to have crisp URLs. But much of the time installing 
> such frameworks requires configuring apache and invoking mod_rewrite and 
> its "cool voodoo" to get the job done. Which can be difficult to debug 
> and get working, and scares newbies. (On re-reading the spec, and the 
> mailing list, I see I'm not the only one to have thought of such a uri 
> mapping component :-)

Definitely.  I like the idea that most WSGI servers and middleware 
(except for the URL mappers) would just take a single application, to 
keep the techniques separate.

> 3. Status code and message.
> ===========================
> 
> The WSGI spec states that the status value passed to start_response 
> should be of the form "999 Message here". That's fine, I can parse up 
> the string easily enough to get the java data types I need to send to 
> the container. However, J2EE does not allow me to set the message 
> string: I can only set the status code, and that must have an integer 
> value.

That raises an interesting question.  As far as I know, no client ever 
pays any attention to the message.  It's purely noise, conveying no 
information.  It might make sense, for simplicity, for the status code 
to be an integer, as it apparently is in Java.

> 5A. Python 2.1 vs. python 2.2: iterators and generators.
> ========================================================
> 
> The WSGI spec says that python 2.2 features are required to be 
> compliant. However, it appears to me that the only python 2.2 features 
> in use are iterators and generators, used when the application object 
> returns an iterator. In fact, it's just that the example in the WSGI 
> spec uses a generator (and its corresponding 'yield' keyword): actual 
> applications are not required to use a generator: they can also return 
> an object that implements the iterator protocol. Which means returning 
> an object with a .next() method when the .__iter__() method is called. 
> The iterator.next() method keeps returning values, until the iterator 
> runs out, in which case it raises StopIteration. Like generators, the 
> iterator protocol was also introduced in python 2.2, but they are two 
> separate things.
> 
> However, even though jython is based on python 2.1, and thus doesn't 
> have built-in support for either iterators or generators, I have still 
> implemented the iterator protocol in my java/jython framework, by simply 
> invoking the .__iter__() and .next() methods on application objects, and 
> catching StopIteration exceptions. So I can support components and 
> applications returning iterators, and I'm thus compliant with the spec, 
> even though I'm running on 2.1. (This is only possible because I'm 
> embedding: it is still not possible to support the iterator protocol in, 
> say, jython for-loops)
> 
> Does the spec need to be changed to reflect this iterators/versioning 
> issue? Or to more clearly define the difference between iterators and 
> generators?
> 
> It's conceivable that even a python 1.5 framework could be programmed to 
> support the iterator protocol: it's *very* easy to implement.

That's also an interesting question.  I guess with both Jython and Zope 
2.6 and earlier being Python 2.1, it should be given some consideration.

One question: should the application iterable be a Python 2.2 style 
iterable?  I.e., it is up to Python 2.1 servers to implement the Python 
2.2 iterator protocol themselves?  Or, should the application be 
responsible to return an iterator, appropriate for the Python version?

In Python <2.2 (including 1.5.2) the protocol was that you called 
__getitem__ with ever-increasing integers, until an IndexError was 
raised.  There was no concept of a special __iter__() function.  But I 
guess Python 2.2's iter() builtin could be simulated:

def iter(obj):
     if type(obj) in (types.ListType, types.TupleType):
         return obj
     elif type(obj) is types.FileType:
         return FileIter(obj)
     elif hasattr(obj, '__iter__'):
         return IterWrapper(obj.__iter__())
     else:
         return IterWrapper(obj)

class FileIter:
     def __init__(self, file):
         self.file = file
     def __getitem__(self, index):
         # while this copies Python 2.2, you wouldn't actually have to
         # iterate line by line:
         value = self.file.readline()
         if value == '':
             raise IndexError
         return value

class IterWrapper:
     def __init__(self, obj):
         self.obj = obj
     def __getitem__(self, index):
         # we ignore the index
         try:
             return self.obj.next()
         except StopIteration:
             raise IndexError

Then in Jython you'd do:

for s in iter(obj):
     write(s)

One issue is that StopIteration isn't defined in earlier versions of 
Python.  You may be able to add it to __builtins__.

Obviously none of this means anything if the application uses 
generators, but in many cases that should make it more portable.

I think it might be the right idea to have the server implement this 
kind of backward portability, rather than applications.  But that might 
be something for the spec, if so.

> 5B. A "python.version" WSGI variable?
> =====================================
> 
> Of course, it will be case that some middleware and applications will 
> require to use more advanced and recent (2.2, 2.3, 2.4) language 
> features, such as generators, generator expressions, decorators, etc. 
> But such components and applications will not be usable under jython, 
> which is 2.1. It would be nice for components and applications to have a 
> way of knowing what version of python they are running under. Similarly, 
> there will jython components and applications that require java 
> libraries, and thus won't be usable on cpython of any version.
> 
> Would it be useful to define a WSGI variable "python.version", similar 
> to "wsgi.version", which gives the python version in effect? In most 
> cases under jython, it wouldn't help, because its 2.1 compiler would 
> choke when loading python files with newer python syntax anyway, giving 
> syntax errors. But it might be useful in some circumstances, perhaps for 
> sophisticated dispatchers with the requisite meta-data available to 
> them? I'm not sure on this one. Maybe the values of sys.platform and 
> os.name give enough information to deal with this problem?

sys.version_info has the information you are looking for.

> 7. Redirects.
> =============
> 
> I read some discussion in the lists on how to handle container specific 
> facilities, e.g. Apache/mod_python's ability to internally redirect a 
> request.
> 
> J2EE offers the same capabilities, to internally redirect a request, 
> without sending a response back to the client. It happens in a slightly 
> different way, because you first ask your container for a dispatcher, 
> based on a url, and then call that dispatcher to redirect to the URL. 
> And the client may not see any redirect HTTP responses: it's all 
> internal to the container.
> 
> I see the solution to this redirect platform-dependence problem in the 
> implementation of a platform-independent WSGI middleware component that 
> takes all responsiblity for redirects. This component examines the 
> wsgi.environment present, seeking hints for the optimal way to redirect 
> the request: if mod_python is available, use the mopd_python API call: 
> if modjy is available, use the getDispatcher(uri).redirect() dance, etc. 
> If none of these platform specific techniques are available, it can fall 
> back to sending a 302 or 307 response back to the client, and let the 
> client re-reqeust the new URL.
> 
> If the platform specific techniques are available, their availability 
> will be signalled in wsgi.envvars by the presence of variables such 
> "mod_python.request" or "modjy.servlet_context", etc. So one 
> ultraportable component could do it all (albeit chock full of special 
> cases).
> 
> Problem solved?

I can also imagine in some future version of WSGI (or some standard 
building on it) that we could decide on a standard interface for doing 
internal redirects, available under a standard key.

> 9. Server-detected headers.
> ===========================
> 
> I can see the reason for servers/containers intercepting client headers 
> and translating/augmenting/deleting them. However, do we need a 
> specification of what to do with certained specified headers? As with 
> CGI, should I recognise the "Status: " header or the "Location: " 
> header, and translate it to the relevant status code, or do a redirect, 
> respectively? If I don't do those translations, won't I be breaking 
> reams of python CGI code out there that relies on Apache doing this?

Right now there should be no Status header, and a Location header should 
not imply a redirect, unlike with CGI.  Any CGI responses have to be 
wrapped to comply.  But there's other issues besides this, so they 
already had to be wrapped.

> 10. The "wsgi.errors" environment variable.
> ==========================================
> 
> Under J2EE, setting the "wsgi.input" variable is easy, I just wrap the 
> HttpServletRequest.getInputStream() with an org.python.core.PyFile, and 
> bingo.
> 
> However, the J2EE HttpServletRequest has no corresponding error stream, 
> nor does the corresponding HttpServletResponse paired with each request. 
> The only mechanism I can use to send error output is the "sendError(int, 
> message)" method of HttpServletResponse. Which allows me to send both an 
> integer status code and a textual message, which the J2EE docs say "The 
> server defaults to creating the response to look like an HTML-formatted 
> server error page containing the specified message, setting the content 
> type to "text/html", leaving cookies and other headers unmodified".

Stuff to wsgi.errors isn't supposed to go to the client.  Under Apache 
it would typically end up in the error log.  Under CGI wsgi.errors is 
usually stderr (and CGI script run under Apache that write to stderr 
also end up writing to the error log).  Error logs -- at least the kind 
that WSGI implies -- are fairly free form.  Though I guess a server 
could buffer the output sent to wsgi.errors, put in some delimiters, add 
some request information, and turn it into a nicely formatted log entry.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From pje at telecommunity.com  Mon Aug 30 04:53:12 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug 30 04:53:33 2004
Subject: [Web-SIG] My experiences implement WSGI on
  java/j2ee/jython.
In-Reply-To: <41327597.5060909@xhaus.com>
Message-ID: <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com>

At 01:32 AM 8/30/04 +0100, Alan Kennedy wrote:

>I suppose I'm really pointing out a possible wording difficulty in the 
>spec, which says "may be an empty string, if there is no more appropriate 
>value". To me None is "a more appropriate value" sometimes, so I suppose I 
>could legitimately interpret that to mean that I can use None values in my 
>WSGI-compliant framework, because my server infrastructure allows me to 
>detect their absence or lack of value.
>
>So perhaps either the wording of the spec needs to be tightened up to 
>exclude this? Or the default environment values need to be more clearly 
>specified? Or perhaps a discussion of None vs. empty string needs to added 
>to the Q&A at the end?

I went to add this to the PEP, and found it was already there:

"""Also note that CGI-defined variables must be strings,
if they are present at all.  It is a violation of this specification
for a CGI variable's value to be of any type other than ``str``."""


>So, in terms of compliance with WSGI, am I in violation of the WSGI spec 
>by not transmitting the actual textual status message specified by the 
>application? If that's a problem, there's nothing I can do about it.

Personally, I would just document it as a minor nonconformance of your 
servlet implementation; it's not likely to be an issue in practice.


>It is fundamental requirement (to me at least) that WSGI be able to handle 
>writing of binary data. And I'm fairly sure the intention for the write() 
>callable in WSGI is that it take python "strings", which includes strings 
>of binary data. But perhaps it needs to made explicitly clear in the WSGI 
>spec that the write() callable explicitly writes in binary mode, i.e. that 
>no translation is taking place on byte strings passed to it, and the 
>application/user is responsible for all encoding concerns relating to byte 
>strings passed to the write() callable.

Added a note about this.


>However, even though jython is based on python 2.1, and thus doesn't have 
>built-in support for either iterators or generators, I have still 
>implemented the iterator protocol in my java/jython framework, by simply 
>invoking the .__iter__() and .next() methods on application objects, and 
>catching StopIteration exceptions. So I can support components and 
>applications returning iterators, and I'm thus compliant with the spec, 
>even though I'm running on 2.1. (This is only possible because I'm 
>embedding: it is still not possible to support the iterator protocol in, 
>say, jython for-loops)

Unfortunately, your technique doesn't actually work, unless you're also 
going to patch the Jython __builtins__ to include 'StopIteration', 'iter', 
and so forth.  You would have to use the pre-2.2 iteration protocol, which 
uses __getitem__ and IndexError.  I think this would have to be something 
you document as a spinoff or "application note" for WSGI users who must use 
a pre-2.2 version of Python.  One of the reasons we decided to go ahead and 
require 2.2.2 was to avoid having to deal with the absence of True/False, 
iterators, and generators.


>It's conceivable that even a python 1.5 framework could be programmed to 
>support the iterator protocol: it's *very* easy to implement.

But not actually *usable* in a pre-2.2 Python, because StopIteration 
doesn't exist, so code can't raise it.  If it has to import it from 
somewhere, then it can't be used with multiple WSGI servers or gateways, 
because each one is expecting a different StopIteration class.


>Would it be useful to define a WSGI variable "python.version", similar to 
>"wsgi.version", which gives the python version in effect?

-1; that's what sys.version, sys.hexversion, sys.version_info, and so on 
are for.


>In the J2EE case (and I'm sure with Apache CGI), that's very simple to 
>deal with, since the container will do it's own buffering completely 
>outside your control, and send the pieces with chunked-transfer encoding 
>if necessary. So even if I put a flush on the output channel in my 
>framework, I'm only flushing it to the container's buffer: it's still not 
>guaranteed to send output back down the return socket to the client.

That is potentially a problem, since the point is to guarantee that when 
'write()' returns to the application, the output isn't going to just sit in 
the buffer while the application moves ahead with other things: it should 
be going to the client.


>I see the solution to this redirect platform-dependence problem in the 
>implementation of a platform-independent WSGI middleware component that 
>takes all responsiblity for redirects. This component examines the 
>wsgi.environment present, seeking hints for the optimal way to redirect 
>the request: if mod_python is available, use the mopd_python API call: if 
>modjy is available, use the getDispatcher(uri).redirect() dance, etc. If 
>none of these platform specific techniques are available, it can fall back 
>to sending a 302 or 307 response back to the client, and let the client 
>re-reqeust the new URL.

I'm afraid internal and external redirects are *not* 
interchangeable.  Specifically, internal redirects break relative 
URLs.  So, internal redirects need to be something that's a server 
extension, and *should* be something obscure to do, because you'd better 
know what you're doing.


>8. Write callable and fileno()
>==============================
>
>It is a good idea to check for the fileno() attribute on the write callable,

No, it isn't.  First of all, it's a callable, not a stream, so it won't 
have such an attribute.  Second, even if it *is* the write method of a 
stream, it's none of the application's business.

Perhaps you're confusing this with the part where the server is allowed to 
check whether the application's return value has a fileno()?


>9. Server-detected headers.
>===========================
>
>I can see the reason for servers/containers intercepting client headers 
>and translating/augmenting/deleting them. However, do we need a 
>specification of what to do with certained specified headers? As with CGI, 
>should I recognise the "Status: " header or the "Location: " header, and 
>translate it to the relevant status code, or do a redirect, respectively? 
>If I don't do those translations, won't I be breaking reams of python CGI 
>code out there that relies on Apache doing this?

Again, WSGI doesn't support internal redirects.  The spec as currently 
written doesn't consider "status" to be a header.  Meanwhile, "Location" is 
a valid HTTP header, so there's no issue there.

If you're doing a WSGI implementation, don't worry about CGI.  If the CGI 
code is ported to WSGI, then fixing these issues are part of the port.  If 
the CGI is run under a "WSGI-to-CGI" wrapper, then this is the wrapper's 
responsibility.  In no case is the interpretation of Status or Location 
headers part of the WSGI server's responsibility.


>Which makes we wonder what the "wsgi.errors" variable is for? Yes, it's 
>for writing error data. But what do we expect to happen to data that gets 
>written to it? Will be it wrapped or translated in some way, and and used 
>to construct an error response to the user? Or should it be locally logged 
>by the server?

"""An output stream to which error output can be written.  For most 
servers, this will be the server's error log."""

I've just added some additional explanatory text:

``wsgi.errors``        An output stream to which error output can be
                        written, for the purpose of recording program
                        or other errors in a standardized and possibly
                        centralized location.  For many servers, this
                        will be the server's main error log.

                        Alternatively, this may be ``sys.stderr``, or
                        a log file of  some sort.  The server's
                        documentation should include an explanation of
                        how to configure this or where to find the
                        recorded output.  A server or gateway may
                        supply different error streams to different
                        applications, if this is desired.


>The J2EE ServletContext for each servlet has a "log(message)" method. 
>Maybe I should just send error output there, in which case it will end in 
>the server logs?

That is probably the right place for a servlet-based WSGI gateway to write 
errors to.

From pje at telecommunity.com  Mon Aug 30 05:01:40 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug 30 05:02:02 2004
Subject: [Web-SIG] My experiences implement WSGI on
  java/j2ee/jython.
In-Reply-To: <41328F61.20705@colorstudy.com>
References: <41327597.5060909@xhaus.com>
 <41327597.5060909@xhaus.com>
Message-ID: <5.1.1.6.0.20040829225537.02740d10@mail.telecommunity.com>

At 09:22 PM 8/29/04 -0500, Ian Bicking wrote:
>One question: should the application iterable be a Python 2.2 style 
>iterable?  I.e., it is up to Python 2.1 servers to implement the Python 
>2.2 iterator protocol themselves?  Or, should the application be 
>responsible to return an iterator, appropriate for the Python version?

How about we just add a "Using WSGI with earlier Python Versions" 
subsection to the application/implementation notes?

It would simply note that a WSGI server/gateway intended to work pre-2.2 
*must* use only a 'for' loop to iterate over an iterable returned by the 
application, and that applications needing to work pre-2.2 would have to 
implement the old-style iteration protocol.

It is *not* necessary for either the server or application to go through 
any special contortions to emulate the 2.2 iterator protocol, because 
current versions of Python still support the old iterator protocol.  See 
PEP 234:

  """For backwards compatibility, the PyObject_GetIter() function
     implements fallback semantics when its argument is a sequence that
     does not implement a tp_iter function: a lightweight sequence
     iterator object is constructed in that case which iterates over
     the items of the sequence in the natural order."""

('iter(ob)' is basically just Python for 'PyObject_GetIter(ob)' in C.)

From pje at telecommunity.com  Mon Aug 30 05:16:00 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug 30 05:16:18 2004
Subject: [Web-SIG] Other kinds of environment variables
In-Reply-To: <5.1.1.6.0.20040827000752.0239b2b0@mail.telecommunity.com>
References: <6BBA3664-F7DB-11D8-82BE-000A95BD86C0@mnot.net>
	<5.1.1.6.0.20040826212629.02641e00@mail.telecommunity.com>
	<5.1.1.6.0.20040826212629.02641e00@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040829231521.022907a0@mail.telecommunity.com>

At 12:11 AM 8/27/04 -0400, Phillip J. Eby wrote:
>At 08:44 PM 8/26/04 -0700, Mark Nottingham wrote:
>>Digest auth sucks much less, and also uses REMOTE_USER.
>
>As I said, REMOTE_USER in a CGI environment leads to nasty local-system 
>security holes: potentially a local user can just set 
>REMOTE_USER=whoeverIwantToBe and invoke the application.
>
>Maybe we should, however, have a configuration key for 
>'wsgi.auth_available' that indicates the availability of the 
>HTTP_AUTHORIZATION header.  Absence of 'wsgi.auth_available' would mean 
>that the availability is unknown, while true or false would indicate 
>definite availability or lack thereof.

Nobody's responded to this; does that mean you all think it's a brilliant 
idea?  ;)

From pje at telecommunity.com  Mon Aug 30 05:25:47 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug 30 05:26:06 2004
Subject: [Web-SIG] Pending modifications to PEP 333
Message-ID: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>

Here are some changes I've proposed in the last few days to resolve issues 
people brought up, but which I haven't gotten much feedback on:

* 'wsgi.fatal_errors' key for exceptions that apps and middleware shouldn't 
trap

* 'wsgi.auth_available' flag

* Make the 'headers' object an 'email.Message' (well, there's been some 
feedback, but I think I addressed the concerns, and there was no feedback 
since)

* what should a server or gateway's default error handling be, for each of 
the eight contexts in which an exception can occur?

* notes on writing pre-2.2 compatible iteration code

* anything else?

I'd really like to get everything but the HTTP/1.1-specific stuff (which 
Mark Nottingham is working on) wrapped up early this week, if possible.  So 
far, there has been surprisingly little comment on the PEP either from 
c.l.py or python-dev, so I'm going to take their silence to mean that the 
PEP is basically perfect, apart from the currently known issues.  ;)

From ianb at colorstudy.com  Mon Aug 30 05:33:35 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon Aug 30 05:33:42 2004
Subject: [Web-SIG] My experiences implement WSGI on  java/j2ee/jython.
In-Reply-To: <5.1.1.6.0.20040829225537.02740d10@mail.telecommunity.com>
References: <41327597.5060909@xhaus.com> <41327597.5060909@xhaus.com>
	<5.1.1.6.0.20040829225537.02740d10@mail.telecommunity.com>
Message-ID: <4132A00F.8010706@colorstudy.com>

Phillip J. Eby wrote:
> How about we just add a "Using WSGI with earlier Python Versions" 
> subsection to the application/implementation notes?
> 
> It would simply note that a WSGI server/gateway intended to work pre-2.2 
> *must* use only a 'for' loop to iterate over an iterable returned by the 
> application, and that applications needing to work pre-2.2 would have to 
> implement the old-style iteration protocol.

This would mean that applications would have to be written with backward 
compatibility in mind.  Which may not be terribly unreasonable.  But I 
don't see any reasonable way you can write version-neutral code.

For instance, file objects are not iterable in older Pythons, so you 
can't return those.  That's pretty annoying.  And there's no method that 
is invoked which warns you that you need to be backward-compatible -- 
__iter__ is called on newer Pythons, but nothing on newer ones.

Of course, those same functions I put in the other email could be 
applied on the application side, maybe conditionally depending on Python 
version.

 From a practical sense, though, I suspect servers are going to be more 
aware of their target Python version than applications.  So server 
authors are going to have more incentive to write the code to deal with 
older Python versions.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From ianb at colorstudy.com  Mon Aug 30 05:38:57 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon Aug 30 05:39:05 2004
Subject: [Web-SIG] Pending modifications to PEP 333
In-Reply-To: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
Message-ID: <4132A151.1000006@colorstudy.com>

Phillip J. Eby wrote:
> Here are some changes I've proposed in the last few days to resolve 
> issues people brought up, but which I haven't gotten much feedback on:
> 
> * 'wsgi.fatal_errors' key for exceptions that apps and middleware 
> shouldn't trap
> 
 > * what should a server or gateway's default error handling be, for each
 > of the eight contexts in which an exception can occur?

Those are hard problems.  Lots of thought.  I haven't done much thought 
on it, so I don't have any comments.

> * 'wsgi.auth_available' flag

Sure.

> * Make the 'headers' object an 'email.Message' (well, there's been some 
> feedback, but I think I addressed the concerns, and there was no 
> feedback since)

I'm -0 on email.Message.

> * notes on writing pre-2.2 compatible iteration code

I'd rather allow lazier applications and put more of the pre-2.2 
compatibility work in the hands of the server.

> * anything else?

Integer status code?  And the Status header.  I'm -0 on a status header. 
  I'm +1 on integer status code.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From ianb at colorstudy.com  Mon Aug 30 06:12:18 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon Aug 30 06:12:25 2004
Subject: [Web-SIG] Stuff left to be done on WSGI
In-Reply-To: <5.1.1.6.0.20040828121031.03326890@mail.telecommunity.com>
References: <5.1.1.6.0.20040827224215.027473f0@mail.telecommunity.com>
	<5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>
	<5.1.1.6.0.20040827174306.02e82ae0@mail.telecommunity.com>
	<5.1.1.6.0.20040827224215.027473f0@mail.telecommunity.com>
	<5.1.1.6.0.20040828121031.03326890@mail.telecommunity.com>
Message-ID: <4132A922.7070008@colorstudy.com>

Phillip J. Eby wrote:
> At 11:51 PM 8/27/04 -0500, Ian Bicking wrote:
> 
>> I don't know if we need deeper hierarchy than that.  E.g., 
>> web.wsgi.cgiadapter.  I don't think so.  I'd rather "WSGI" be a term 
>> only those in the know use -- it means nothing unless you expand the 
>> acronym, and even then it's pretty vague.  Ultimately I hope most web 
>> programmers just don't need to think about any of it.
> 
> 
> Flat is better than nested; let's not mix other projects into this.  The 
> WSGI stuff will have enough content to deserve a package of its own, and 
> we don't want it to be dependent upon a bunch of "next generation" stuff 
> that's not even designed yet.

Will it really?  And how will it be organized?  There's some utility 
functions, which don't deserve a module.  There's WSGIHTTPServer, based 
on BaseHTTPServer.  And maybe some CGI WSGI server.

I imagine other things could come along, but not right away, and where 
would they go?  Added to some top-level module?  A new module?

I also *really* dislike the name wsgi for a module.  It's a fine name 
for discussing this, but I'm really opposed to it becoming a name used 
more widely.  Not because I think there's a better name, but because the 
function is important and the name isn't.  One of the things we can do 
if this is an approved PEP is that we don't have to qualify this as 
one-of-many, using a distinguishing name.

>> Yes, you are right.  Which means the catcher has to keep track of the 
>> headers that were sent if it hopes to do anything.  In that case, it 
>> might check for text/html or text/plain; if not those two, then just 
>> stop the response short and log the error.  If so, and if configured 
>> to show errors, then it could display them; cgitb goes to some length 
>> to make HTML render correctly.
>>
>> That makes me think that wrapping send_response is more reasonable. 
>> Though it makes error resolution in servers more complex.
> 
> 
> I'm not sure I follow you.  The error handling in the server would look 
> just like the handling in middleware, no?  In fact, this potentially 
> sounds like a job for another boilerplate function in wsgi.util, or 
> perhaps a class.  I imagine we might have an AbstractWSGIServer that 
> defines basic start-response, write, and other operations, with abstract 
> methods for sending/receiving data to and from the client, and various 
> overrideable methods for policy.  The simple WSGIServer and CGI gateway 
> would both derive from it, or perhaps delegate to it.

To me that feels like it makes implementation more complicated, rather 
than less.  Maybe not really, but I think it will *feel* more 
complicated.  I think a good example is more helpful to authors.  All 
these issues are very much part of the control flow, and abstracting 
control flow leads (IMHO) to confusing class structures.

>>> set_charset/get_charset -- sets the character set parameters of the 
>>> content-type, which is actually useful.  On the down side, setting 
>>> the character set sets MIME-Version, but it also sets the 
>>> Content-Transfer-Encoding, so it doesn't force the server to default 
>>> one.
>>
>>
>> Would that start opening up the possibility of accepting Unicode to 
>> write()/app_iter?
> 
> 
> In my view, no, because then we'd force the server to know about every 
> possible encoding the client and app can come up with.  If the app uses 
> this, it should handle the encoding.  We might want to include a utility 
> routine or two to pull what the client accepts out of HTTP_ACCEPT et al.

Python seems to be pretty good at dealing with a lot of different 
encodings.  A lot of work on this has gone into the base Python 
distribution -- I don't think there's any better source of code on encoding.

It opens up a big can of worms, so I don't mind ignoring encoding, but 
maybe that's just because I'm American and I'm lazy and usually ignore 
encoding, so it's mysterious to me.

>>> __len__, __getitem__, __setitem__, __delitem__, __contains__, 
>>> has_key, get, keys, values, items -- case-insensitive dictionary-like 
>>> interface (i.e., the stuff we mainly want)
>>> get_all -- all values for a header name
>>> add_header, replace_header -- more stuff we want
>>
>>
>> Very good, though not hard to reimplement.
> 
> 
> But why should everybody reimplement it, if we're not going to be in the 
> stdlib till 2005?

Well, if we already have utility functions, this is just a utility 
class.  And it would be a very small and easy to understand.  Smaller 
and easier to understand than email.Message, certainly, and with no 
distracting vestigal pieces.

>> Okay, looking through the code briefly, I can't help but think that 
>> all the complex parts are parts we don't care about.
> 
> 
> Not so; content-type parameter setting is quite handy.  For example, if 
> you're doing multipart push, you'll need e.g. set_boundary and 
> get_boundary might also be useful.
> 
> 
>>> Well, to some extent we have to look at the question of what should 
>>> happen in those circumstances anyway, whether we solve the problem in 
>>> that specific way or not.  Because if the application *does* call 
>>> start_response more than once, the server has to be able to handle it 
>>> *somehow*.  Really, the ultimate error handling *has* to be done by 
>>> servers, unless they want to take the route of crashing the entire 
>>> process when something bad happens.  :)
>>
>>
>> Good question.  I think servers should consider that an error, but 
>> they should handle that error gracefully.  Which probably means 
>> keeping a "has send_response already been called" flag.
>>
>> Now, if I could get access to that flag from middleware... and maybe 
>> access to the headers and status that have already been sent... (and 
>> really, why not?  We aren't worried about streaming headers like we 
>> are about bodies)
> 
> 
> You dodged my question...  what are you going to *do* with that?  
> Because we need to formulate sensible error handling policies for the 
> general case, including things like an I/O error due to the client 
> disconnecting.

Well, in some cases I would try to display errors to the client.  Though 
maybe a class of errors -- particularly those that happen during the 
iteration phase, or after start_response -- could just go to a log. 
OTOH, I'd want to show *some* indication to the client that an error has 
occured, and the response is incomplete, at least for human-readable 
content (text/html and maybe text/plain).

But not in all cases, like I/O error.  OTOH, I might log errors *only* 
when I couldn't display them to the client (during development).

> Here are possible loci of error:
> 
>    * Before start_response is called (application error)

Easy to handle.  Display a traceback, or a technical-problems error 
message and log the error.

>    * During start_response (server error or application error

What application errors are you thinking of?  Like invoking 
start_response incorrectly?

Server errors should probably be handled by the server.  It might be 
nice if the server always raised a single exception (say, 
WSGIServerError), so a start_response definition might look like:

def start_response(status, headers):
     try:
         blah blah
     except ServerIOError:
         do something
         raise WSGIServerError

And applications shouldn't catch (or should re-raise) a server error.

>    * After start_response, before first write  (application error)

I'd like the option here to display an error to the client, dependent on 
the content-type.

>    * During a write (server error or application error)

Another WSGIServerError?

>    * Between writes, before return (application error)

Depending on content-type, a last write would be good.

>    * After return/during iteration (application error)

Again, depending on content-type, a last write (well, iteration) would 
be nice.  Less important generally.

>    * During a post-return write (server error or application error)

I'm not sure what you're thinking here?

>    * During 'close()' (application error)

Logged to wsgi.errors, nothing else.

> The reason those are "server or application" is because start_response 
> and write can fail due to bad data passed by the application, so it's 
> really an application error in that case.  The server might fail for 
> some other reason, of course, like a lost client connection.
 >
> One issue here is that an application or middleware error handler needs 
> to know whether the error is the application's or the server's.  It 
> makes no sense for a failed write to cause a middleware error handler to 
> attempt to write some more data!  It seems we need an error parameter like:
> 
>    environ['wsgi.fatal_errors'] = SomeExceptionClass1, 
> SomeExceptionClass2,...
> 
> Such that one would use:
> 
>    try:
>        # invoke child application, etc.
>    except environ['wsgi.fatal_errors']:
>        raise
>    except:
>        # regular error handling here
> 
> In other words, an application or middleware component should abort if 
> it receives one of these exception types.  I'm inclined to think that 
> application WSGI programming errors should be treated as fatal: if the 
> app sends bad parameters to start_response or write, there's little 
> point in proceeding further.

Hmm... that would work too.  Then the type of the exception wouldn't be 
lost, though servers would also be able to encode the type inside a 
single exception.  OTOH, by using a tuple there, you could avoid 
requiring any wsgi module which defines this particular exception.

I would probably call these "server_errors" rather than "fatal_errors", 
though I guess it amounts to the same thing.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From floydophone at gmail.com  Mon Aug 30 06:16:54 2004
From: floydophone at gmail.com (Peter Hunt)
Date: Mon Aug 30 06:16:57 2004
Subject: [Web-SIG] My repository of WSGI code
Message-ID: <6654eac4040829211664129ec@mail.gmail.com>

http://st0rm.hopto.org/wsgi/

The files listed there are:
- jonpy_wsgi.py - wsgi to jonpy adapter
- test.cgi.py - test jonpy application using wsgi
- wsgicgi.py - run a wsgi application under 
- WSGIHTTPServer.py - copycat of CGIHTTPServer, except it runs WSGI apps
- testhttpserver.py - tests the WSGIHTTPServer.py class

Please submit any patches/comments. Perhaps we could improve upon
these scripts and include them in the distribution?
From ods at strana.ru  Mon Aug 30 11:06:49 2004
From: ods at strana.ru (Denis S. Otkidach)
Date: Mon Aug 30 11:12:12 2004
Subject: [Web-SIG] Re: Pending modifications to PEP 333
In-Reply-To: <4132A151.1000006@colorstudy.com>
References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
	<4132A151.1000006@colorstudy.com>
Message-ID: <20040830130649.74f1826f.ods@strana.ru>

On Sun, 29 Aug 2004 22:38:57 -0500
Ian Bicking <ianb@colorstudy.com> wrote:

> > * Make the 'headers' object an 'email.Message' (well, there's been some 
> > feedback, but I think I addressed the concerns, and there was no 
> > feedback since)
> 
> I'm -0 on email.Message.

Below is a class we use for headers in our framework for several years.
I guess it's more comfortable than list of tuples or email.Message.
Anyway, we have to fix only "must have" interface, but not all useful
methods.


class Headers:
    '''Dictionary-like object of HTTP headers with case insensitive key lookup
    and add() method. The order of headers is preserved.'''

    def __init__(self, data={}):
        self._headers =  []
        self._headers_map = {}
        if data:
            if isinstance(data, dict):
                # From dictionary
                for key, value in data.iteritems():
                    self.add(key, value)
            else:
                # from any sequence of pairs
                for key, value in data:
                    self.add(key, value)
            # XXX Here can be initialization from other types: string, file.

    def __iter__(self):
        return iter(self._headers)

    def __len__(self):
        return len(self._headers)

    def keys(self):
        return self._headers_map.keys()

    def has_key(self, key):
        return self._headers_map.has_key(key)

    def add(self, key, value):
        self._headers.append((key, value))
        self._headers_map.setdefault(key.lower(), []).append(value)

    def __getitem__(self, key):
        '''Get header. If there are several header with the same key, their
        values are joined.'''
        # RFC 2616, 4.2 Message Headers
        return ', '.join(self._headers_map[key.lower()])

    def __setitem__(self, key, value):
        '''Replace headers with the same key.'''
        del self[key]
        self.add(key, value)

    def __delitem__(self, key):
        '''Delete all headers with this key. Never fail.'''
        key = key.lower()
        if self._headers_map.has_key(key):
            del self._headers_map[key]
            self._headers = [(k, v) for (k, v) in self._headers
                                    if k.lower()!=key]

    def __str__(self):
        return '\r\n'.join(['%s: %s' % h for h in self._headers])+'\r\n'

-- 
Denis S. Otkidach
http://www.python.ru/      [ru]
From pje at telecommunity.com  Mon Aug 30 15:33:14 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug 30 15:33:42 2004
Subject: [Web-SIG] Re: Pending modifications to PEP 333
In-Reply-To: <20040830130649.74f1826f.ods@strana.ru>
References: <4132A151.1000006@colorstudy.com>
	<5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
	<4132A151.1000006@colorstudy.com>
Message-ID: <5.1.1.6.0.20040830093138.037788d0@mail.telecommunity.com>

At 01:06 PM 8/30/04 +0400, Denis S. Otkidach wrote:
>On Sun, 29 Aug 2004 22:38:57 -0500
>Ian Bicking <ianb@colorstudy.com> wrote:
>
> > > * Make the 'headers' object an 'email.Message' (well, there's been some
> > > feedback, but I think I addressed the concerns, and there was no
> > > feedback since)
> >
> > I'm -0 on email.Message.
>
>Below is a class we use for headers in our framework for several years.
>I guess it's more comfortable than list of tuples or email.Message.
>Anyway, we have to fix only "must have" interface, but not all useful
>methods.

Hi Denis; thanks for the input.  Unfortunately, WSGI needs to either use a 
class/type that's available in the Python standard library, or else a 
simple protocol like "sequence of name,value pairs".

From wilk-ml at flibuste.net  Mon Aug 30 16:01:53 2004
From: wilk-ml at flibuste.net (William Dode)
Date: Mon Aug 30 16:02:01 2004
Subject: [Web-SIG] Re: Pending modifications to PEP 333
In-Reply-To: <5.1.1.6.0.20040830093138.037788d0@mail.telecommunity.com>
	(Phillip J. Eby's message of "Mon, 30 Aug 2004 09:33:14 -0400")
References: <4132A151.1000006@colorstudy.com>
	<5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
	<4132A151.1000006@colorstudy.com>
	<5.1.1.6.0.20040830093138.037788d0@mail.telecommunity.com>
Message-ID: <87isb0912m.fsf@blakie.riol>

"Phillip J. Eby" <pje@telecommunity.com> writes:

> At 01:06 PM 8/30/04 +0400, Denis S. Otkidach wrote:
>>On Sun, 29 Aug 2004 22:38:57 -0500
>>Ian Bicking <ianb@colorstudy.com> wrote:
>>
>> > > * Make the 'headers' object an 'email.Message' (well, there's been some
>> > > feedback, but I think I addressed the concerns, and there was no
>> > > feedback since)
>> >
>> > I'm -0 on email.Message.
>>
>>Below is a class we use for headers in our framework for several years.
>>I guess it's more comfortable than list of tuples or email.Message.
>>Anyway, we have to fix only "must have" interface, but not all useful
>>methods.
>
> Hi Denis; thanks for the input.  Unfortunately, WSGI needs to either
> use a class/type that's available in the Python standard library, or
> else a simple protocol like "sequence of name,value pairs".

I also think email.Message is overkill for this and it can be very
surprising to see an "email message" here...

-- 
William Dod? - http://flibuste.net
From steve at holdenweb.com  Mon Aug 30 16:04:52 2004
From: steve at holdenweb.com (Steve Holden)
Date: Mon Aug 30 16:07:25 2004
Subject: [Web-SIG] Re: Pending modifications to PEP 333
In-Reply-To: <20040830130649.74f1826f.ods@strana.ru>
References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>	<4132A151.1000006@colorstudy.com>
	<20040830130649.74f1826f.ods@strana.ru>
Message-ID: <41333404.6070508@holdenweb.com>

Denis S. Otkidach wrote:

> On Sun, 29 Aug 2004 22:38:57 -0500
> Ian Bicking <ianb@colorstudy.com> wrote:
> 
> 
>>>* Make the 'headers' object an 'email.Message' (well, there's been some 
>>>feedback, but I think I addressed the concerns, and there was no 
>>>feedback since)
>>
>>I'm -0 on email.Message.
> 
> 
> Below is a class we use for headers in our framework for several years.
> I guess it's more comfortable than list of tuples or email.Message.
> Anyway, we have to fix only "must have" interface, but not all useful
> methods.
> 
> 
[...]
> 
>     def __getitem__(self, key):
>         '''Get header. If there are several header with the same key, their
>         values are joined.'''
>         # RFC 2616, 4.2 Message Headers
>         return ', '.join(self._headers_map[key.lower()])
> 
[...]
Since this module has seen productions use, can we take it you've had no 
problem joining cookie values with dates containing commas? This was one 
of the arguments for maintaining separate multiple headers of the same 
type, IIRC.

regards
  Steve

-- 
XXX Please note recent change of email address

From wilk-ml at flibuste.net  Mon Aug 30 16:58:49 2004
From: wilk-ml at flibuste.net (William Dode)
Date: Mon Aug 30 16:58:51 2004
Subject: [Web-SIG] Pending modifications to PEP 333
In-Reply-To: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
	(Phillip J. Eby's message of "Sun, 29 Aug 2004 23:25:47 -0400")
References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
Message-ID: <87eklo8yfq.fsf@blakie.riol>

"Phillip J. Eby" <pje@telecommunity.com> writes:

> So far, there has been surprisingly little comment on the
> PEP either from c.l.py or python-dev, so I'm going to take their
> silence to mean that the PEP is basically perfect, apart from the
> currently known issues.  ;)

First, thanks you (and the others) for the great works. 

Like a lot of people i think, i did my own modest framework, because
it's near my need and it's not difficult to do. I don't think it can be
a problem to still have a lot of framework in the community, each one is
very specific and it's not difficult to write his own framework "aux
petits oignons". But it's more difficult to write a server, everybody
make his hack on top of BaseHTTPServer and reinvent the wheels. It's
also because of the need to adapt his framework to BaseHTTPServer that
this server doesn't evolve in the lib std, the same for cgi.

So, when servers will follow this specification it'll be a breath of
oxygen ! I'll keep my framework and throw away my servers. You found a
really good point with this gateway :-)

-- 
William Dod? - http://flibuste.net
From ods at strana.ru  Mon Aug 30 17:33:09 2004
From: ods at strana.ru (Denis S. Otkidach)
Date: Mon Aug 30 17:38:19 2004
Subject: [Web-SIG] Re: Pending modifications to PEP 333
In-Reply-To: <41333404.6070508@holdenweb.com>
References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
	<4132A151.1000006@colorstudy.com>
	<20040830130649.74f1826f.ods@strana.ru>
	<41333404.6070508@holdenweb.com>
Message-ID: <20040830193309.1baa0b01.ods@strana.ru>

On Mon, 30 Aug 2004 10:04:52 -0400
Steve Holden <steve@holdenweb.com> wrote:

> [...]
> > 
> >     def __getitem__(self, key):
> >         '''Get header. If there are several header with the same key, their
> >         values are joined.'''
> >         # RFC 2616, 4.2 Message Headers
> >         return ', '.join(self._headers_map[key.lower()])
> > 
> [...]
> Since this module has seen productions use, can we take it you've had no 
> problem joining cookie values with dates containing commas? This was one 
> of the arguments for maintaining separate multiple headers of the same 
> type, IIRC.

As you can see we do maintain separate headers with the same name.  So there
is no problem with Set-Cookie header.  Here should be method like
FieldStorage.getlist() for completeness, but we didn't ever need it.

-- 
Denis S. Otkidach
http://www.python.ru/      [ru]
From ods at strana.ru  Mon Aug 30 17:38:40 2004
From: ods at strana.ru (Denis S. Otkidach)
Date: Mon Aug 30 17:43:49 2004
Subject: [Web-SIG] Re: Pending modifications to PEP 333
In-Reply-To: <5.1.1.6.0.20040830093138.037788d0@mail.telecommunity.com>
References: <4132A151.1000006@colorstudy.com>
	<5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
	<4132A151.1000006@colorstudy.com>
	<5.1.1.6.0.20040830093138.037788d0@mail.telecommunity.com>
Message-ID: <20040830193840.6b12ae9b.ods@strana.ru>

On Mon, 30 Aug 2004 09:33:14 -0400
"Phillip J. Eby" <pje@telecommunity.com> wrote:

> >Below is a class we use for headers in our framework for several years.
> >I guess it's more comfortable than list of tuples or email.Message.
> >Anyway, we have to fix only "must have" interface, but not all useful
> >methods.
> 
> Hi Denis; thanks for the input.  Unfortunately, WSGI needs to either use a 
> class/type that's available in the Python standard library, or else a 
> simple protocol like "sequence of name,value pairs".

"sequence of name,value pairs" is OK - my class satisfies this interface if
you mean just iterable object when saying "sequence", and not real list.

-- 
Denis S. Otkidach
http://www.python.ru/      [ru]
From wilk-ml at flibuste.net  Mon Aug 30 19:10:55 2004
From: wilk-ml at flibuste.net (William Dode)
Date: Mon Aug 30 19:11:12 2004
Subject: [Web-SIG] Pending modifications to PEP 333
In-Reply-To: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
	(Phillip J. Eby's message of "Sun, 29 Aug 2004 23:25:47 -0400")
References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
Message-ID: <87sma47dr4.fsf@blakie.riol>

"Phillip J. Eby" <pje@telecommunity.com> writes:

>  So far, there has been surprisingly little comment on the
> PEP either from c.l.py or python-dev, so I'm going to take their
> silence to mean that the PEP is basically perfect, apart from the
> currently known issues.  ;)

One important things of course is that current frameworks and servers
implement the specs. When the most famous will begin, the others will
follow, but who will begin ? Shall we ask them on their mailing-list ?
Are they here ?

-- 
William Dod? - http://flibuste.net
From fumanchu at amor.org  Mon Aug 30 19:11:14 2004
From: fumanchu at amor.org (Robert Brewer)
Date: Mon Aug 30 19:16:50 2004
Subject: [Web-SIG] Pending modifications to PEP 333
Message-ID: <3A81C87DC164034AA4E2DDFE11D258E3022E98@exchange.hqamor.amorhq.net>

William Dode wrote:
> "Phillip J. Eby" <pje@telecommunity.com> writes:
> 
> >  So far, there has been surprisingly little comment on the
> > PEP either from c.l.py or python-dev, so I'm going to take their
> > silence to mean that the PEP is basically perfect, apart from the
> > currently known issues.  ;)
> 
> One important things of course is that current frameworks and servers
> implement the specs. When the most famous will begin, the others will
> follow, but who will begin ? Shall we ask them on their mailing-list ?
> Are they here ?

The intermediate step for me as a framework writer is to write my own
WSGI wrapper for mod_python, for example, so that when mod_python grows
its own WSGI interface, the replacement will be nearly seamless. I
expect others are doing the same, if only for testing purposes, so I
don't think we're in a huge rush.

But yes, some of the "more famous" server authors are here and gave
input on the spec.


Robert Brewer
MIS
Amor Ministries
fumanchu@amor.org
From py-web-sig at xhaus.com  Mon Aug 30 22:02:57 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Mon Aug 30 21:58:27 2004
Subject: [Web-SIG] Iterator protocols.
In-Reply-To: <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com>
References: <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com>
Message-ID: <413387F1.7010804@xhaus.com>

[Alan Kennedy]
 >> However, even though jython is based on python 2.1, and thus doesn't
 >> have built-in support for either iterators or generators, I have still
 >> implemented the iterator protocol in my java/jython framework

[Philip J. Eby]
 > Unfortunately, your technique doesn't actually work, unless you're also
 > going to patch the Jython __builtins__ to include 'StopIteration',
 > 'iter', and so forth.  You would have to use the pre-2.2 iteration
 > protocol, which uses __getitem__ and IndexError.  I think this would
 > have to be something you document as a spinoff or "application note" for
 > WSGI users who must use a pre-2.2 version of Python.  One of the reasons
 > we decided to go ahead and require 2.2.2 was to avoid having to deal
 > with the absence of True/False, iterators, and generators.

[Ian Bicking]
 > That's an interesting question.  I guess with both Jython and Zope
 > 2.6 and earlier being Python 2.1, it should be given some consideration.
 >
 > One question: should the application iterable be a Python 2.2 style
 > iterable?  I.e., it is up to Python 2.1 servers to implement the Python
 > 2.2 iterator protocol themselves?  Or, should the application be
 > responsible to return an iterator, appropriate for the Python version?
 >
 > In Python <2.2 (including 1.5.2) the protocol was that you called
 > __getitem__ with ever-increasing integers, until an IndexError was
 > raised.  There was no concept of a special __iter__() function.  But I
 > guess Python 2.2's iter() builtin could be simulated:

Well, now I'm confused :-)

Firstly, my 2.1 implementation of the 2.2 iterator protocol does work, 
because I do create a StopIteration exception and poke it into 
__builtin__. Which isn't the prettiest of approaches, but it works.

I'm currently testing on an application object defined like this:

################
class handler:

   def __init__(self, environ, start_response):
     start_response("200 OK", [])
     self.i = 0

   def __iter__(self):
     return self

   def next(self):
     if self.i < 6:
       self.i += 1
       return "<h%d>Hello WSGI World!</h%d>\n" % (self.i, self.i)
     else:
       raise StopIteration()
#################

And it works as expected: as I expected ;-)

So the two consequent questions I have are

1. Is there something wrong with my approach of defining a StopIteration 
exception, and poking it into __builtin__?

2. Do I need to implement the old pre-2.2 iterator protocol as well? It 
had never occurred to me to implement that: I was focussed only on 2.2 
iterators.

While we're on the subject of python 2.2 requisites, it's also trivial 
for me to define True and False. Which leaves generators as the only 2.2 
facility I can't do anything about. But since generators are optional 
for application/middleware authors, doesn't that mean that 2.2.2 is not 
required as the minimum version for framework authors, only for 
2.2-dependent components that are plugged into their framework?

Keep up the good work!

Regards,

Alan.
From pje at telecommunity.com  Mon Aug 30 22:18:43 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon Aug 30 22:18:13 2004
Subject: [Web-SIG] Iterator protocols.
In-Reply-To: <413387F1.7010804@xhaus.com>
References: <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com>
	<5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com>

At 09:02 PM 8/30/04 +0100, Alan Kennedy wrote:

>So the two consequent questions I have are
>
>1. Is there something wrong with my approach of defining a StopIteration 
>exception, and poking it into __builtin__?

Yes; it won't work with anything else that pokes its own StopIteration into 
__builtin__.  This is very fragile; don't do it.


>2. Do I need to implement the old pre-2.2 iterator protocol as well? It 
>had never occurred to me to implement that: I was focussed only on 2.2 
>iterators.

If you're writing a server or gateway, you don't need to implement it at 
all: use a "for" loop to iterate over the iterable, and all will be well.

If you're writing an application that must work under pre-2.2 Python, you 
must implement the *old* iterator protocol, and only that protocol.  You do 
not have to implement the new iterator protocol "as well".  Implement the 
old protocol *instead*.

Following these guidelines will make your code both "forward" and 
"backward" compatible, since newer Pythons still recognize the old iterator 
protocol.


>While we're on the subject of python 2.2 requisites, it's also trivial for 
>me to define True and False. Which leaves generators as the only 2.2 
>facility I can't do anything about. But since generators are optional for 
>application/middleware authors, doesn't that mean that 2.2.2 is not 
>required as the minimum version for framework authors, only for 
>2.2-dependent components that are plugged into their framework?

Correct.  By the way, there's no need to define True and False either; a 
server or gateway supporting a pre-2.2.2 version of Python should just use 
1 and 0.  The PEP doesn't actually require the use of True and False, it 
just refers to "true values" and "false values".

From py-web-sig at xhaus.com  Mon Aug 30 22:25:00 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Mon Aug 30 22:20:30 2004
Subject: [Web-SIG] Container buffering of output.
In-Reply-To: <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com>
References: <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com>
Message-ID: <41338D1C.9000704@xhaus.com>

[Alan Kennedy]
 >> In  .. J2EE .. the container will do it's own buffering completely
 >> outside your control, and send the pieces with chunked-transfer
 >> encoding if necessary. So even if I put a flush on the output channel
 >> in my framework, I'm only flushing it to the container's buffer: it's
 >> still not guaranteed to send output back down the return socket to the
 >> client.

[Phillip J. Eby]
 > That is potentially a problem, since the point is to guarantee that when
 > 'write()' returns to the application, the output isn't going to just sit
 > in the buffer while the application moves ahead with other things: it
 > should be going to the client.

Hmmm, I don't see how it would be a problem. Although I suppose that 
depends on what you mean by "the output isn't going to just sit in the 
buffer": which buffer?

As you say, when the write() returns, the application's output has been 
sent as far as I can send it. My entire thread of execution for a 
request may have ended, and the output may still be sitting in some 
container's (i.e. Apache, Tomcat) buffer, i.e. not sent to the client: 
there's nothing I can do about that. I can call flush on my 
OutputStream, but I can't guarantee that the container will respect that 
by actually flushing to the client, for whatever reasons it may have.

This already happens with plain CGI. That's the way that containers like 
Apache and Tomcat deal with most dynamic content: buffer CGI/etc output 
until the buffer is full, then send a chunk to the client.

The behaviour of the container will probably be different if a 
Content-Length header is set: it might pass the output straight through, 
or it might still buffer it. That's container-specific.

This is all an inevitable consequence of running inside a container of 
some kind.

However, if the container were written in python, e.g. SimpleHttpServer, 
Medusa or Twisted, they could meet the guarantee "sent down the socket 
to the client before the write() returns", because they hold the socket 
connected to the client. He who holds the socket calls the shots.

I don't see any of this presenting a problem for WSGI.

Regards,

Alan.

From ianb at colorstudy.com  Mon Aug 30 22:35:16 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon Aug 30 22:35:37 2004
Subject: [Web-SIG] Iterator protocols.
In-Reply-To: <5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com>
References: <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com>	<5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com>
	<5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com>
Message-ID: <41338F84.6030403@colorstudy.com>

Phillip J. Eby wrote:
> At 09:02 PM 8/30/04 +0100, Alan Kennedy wrote:
> 
>> So the two consequent questions I have are
>>
>> 1. Is there something wrong with my approach of defining a 
>> StopIteration exception, and poking it into __builtin__?
> 
> 
> Yes; it won't work with anything else that pokes its own StopIteration 
> into __builtin__.  This is very fragile; don't do it.

Why?  So long as he is catching the StopIteration that is in 
__builtin__, which may or may not be the object he originally put in 
there, it should all be fine.  So maybe he should do:

try:
     StopIteration
except NameError:
     class StopIteration(Exception):
         pass
     __builtin__.StopIteration = StopIteration
     del StopIteration

-- 
Ian Bicking  /  ianb@colorstudy.com  /  http://blog.ianbicking.org
From py-web-sig at xhaus.com  Mon Aug 30 23:05:08 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Mon Aug 30 23:00:37 2004
Subject: [Web-SIG] Iterator protocols.
In-Reply-To: <41338F84.6030403@colorstudy.com>
References: <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com>	<5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com>
	<5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com>
	<41338F84.6030403@colorstudy.com>
Message-ID: <41339684.9060606@xhaus.com>

[Ian Bicking]
> Why?  So long as he is catching the StopIteration that is in 
> __builtin__, which may or may not be the object he originally put in 
> there, it should all be fine.  So maybe he should do:
> 
> try:
>     StopIteration
> except NameError:
>     class StopIteration(Exception):
>         pass
>     __builtin__.StopIteration = StopIteration
>     del StopIteration

:-)

Here's my implementation: minds think alike!

private void create_stop_iteration ( )
   {
   interp.exec(
   "try:\n"+
   "  StopIteration\n"+
   "except NameError:\n"+
   "  class StopIteration(Exception): pass\n"+
   "  import sys ; sys.add_package('org.python.core')\n"+
   "  from org.python.core import __builtin__\n"+
   "  __builtin__.StopIteration = StopIteration\n"+
   "  del StopIteration\n"
     );
   }

Regards,

Alan.


From py-web-sig at xhaus.com  Mon Aug 30 22:45:03 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Mon Aug 30 23:09:54 2004
Subject: [Web-SIG] Iterator protocols.
In-Reply-To: <5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com>
References: <5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com>
	<5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com>
	<5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com>
Message-ID: <413391CF.4070109@xhaus.com>

Phillip,

I really am confused now by what you say. Is it possible that you're 
misunderstanding my approach?

I should make it explicitly clear that I am writing this in Java. So 
when I say I'm iterating over the iterable, I do it this way

//-------------------------------------
  PyObject iterable = app_result.invoke("__iter__");
  PyObject next_object = null;
  while (true)
   {
   try
    { next_object = iterable.invoke("next"); }
   catch (PyException pe)
    {
    // Pseudo-code here
    if pe is StopIteration: break
    }
 
start_response_callable.write_callable.write(((PyString)next_object).toString());
   }
//-------------------------------------

So in light of that .....

[Alan Kennedy]
 >> 1. Is there something wrong with my approach of defining a
 >> StopIteration exception, and poking it into __builtin__?

[Phillip J. Eby]
 > Yes; it won't work with anything else that pokes its own StopIteration
 > into __builtin__.  This is very fragile; don't do it.

Hmm, I still don't see the problem. I've got complete control of the 
interpreter, since I am instantiating it. So I can guarantee that any 
mods I make will be made before any other code. I think of it as 
specializing the interpreter to have a new exception.

[Alan Kennedy]
 >> 2. Do I need to implement the old pre-2.2 iterator protocol as well?
 >> It had never occurred to me to implement that: I was focussed only on
 >> 2.2 iterators.

[Phillip J. Eby]
 > If you're writing a server or gateway, you don't need to implement it at
 > all: use a "for" loop to iterate over the iterable, and all will be well.

Ah, but this sentence only makes sense if I'm writing python/jython: I'm 
writing java.

 > If you're writing an application that must work under pre-2.2 Python,
 > you must implement the *old* iterator protocol, and only that protocol.
 > You do not have to implement the new iterator protocol "as well".
 > Implement the old protocol *instead*.

To me, the purpose of implementing the 2.2 iterator protocol is so that 
applications and components run inside my framework will work, if they 
support the 2.2 iterator protocol. I'm really not interested in the 
pre-2.2 protocol at all, though I suppose I should be if people want to 
run pre-2.2 iterable components in my framework.

 > Following these guidelines will make your code both "forward" and
 > "backward" compatible, since newer Pythons still recognize the old
 > iterator protocol.

To some degree, my framework *is* the python in this case.

[Alan Kennedy]
 >> While we're on the subject of python 2.2 requisites, it's also trivial
 >> for me to define True and False. Which leaves generators as the only
 >> 2.2 facility I can't do anything about. But since generators are
 >> optional for application/middleware authors, doesn't that mean that
 >> 2.2.2 is not required as the minimum version for framework authors,
 >> only for 2.2-dependent components that are plugged into their framework?

[Phillip J. Eby]
 > Correct.  By the way, there's no need to define True and False either; a
 > server or gateway supporting a pre-2.2.2 version of Python should just
 > use 1 and 0.  The PEP doesn't actually require the use of True and
 > False, it just refers to "true values" and "false values".

I think I'll set them anyway. That way, components running inside my 
framework won't break if they refer to True or False.

Kind regards,

Alan.

From pje at telecommunity.com  Tue Aug 31 01:59:37 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Aug 31 01:59:15 2004
Subject: [Web-SIG] Iterator protocols.
In-Reply-To: <413391CF.4070109@xhaus.com>
References: <5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com>
	<5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com>
	<5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com>
	<5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040830194335.020e93c0@mail.telecommunity.com>

At 09:45 PM 8/30/04 +0100, Alan Kennedy wrote:
>I really am confused now by what you say. Is it possible that you're 
>misunderstanding my approach?

No; your approach just isn't portable, and breaks the cross-server 
compatibility that's the point of WSGI.  See below.


>I should make it explicitly clear that I am writing this in Java. So when 
>I say I'm iterating over the iterable, I do it this way
>
>//-------------------------------------
>  PyObject iterable = app_result.invoke("__iter__");
>  PyObject next_object = null;
>  while (true)
>   {
>   try
>    { next_object = iterable.invoke("next"); }
>   catch (PyException pe)
>    {
>    // Pseudo-code here
>    if pe is StopIteration: break
>    }
>start_response_callable.write_callable.write(((PyString)next_object).toString());
>   }
>//-------------------------------------
>
>So in light of that .....

...this code won't work if the application returns, say, a list.  But a 
list *would* be a perfectly valid iterable in a "normal" WSGI server or 
gateway; therefore, this approach is broken.

Meanwhile, an application that wants to support running in pre-2.2 
containers *other* than yours, is now forced to implement *both* the old 
and the new protocol!

This is clearly broken, since there's no reason to require 
backward-compatible application code to implement a protocol that isn't 
implemented by the version of Python they're trying to support.


>[Phillip J. Eby]
> > If you're writing a server or gateway, you don't need to implement it at
> > all: use a "for" loop to iterate over the iterable, and all will be well.
>
>Ah, but this sentence only makes sense if I'm writing python/jython: I'm 
>writing java.

Well, perhaps you should check whether there is a Java API you can access 
from Jython that's akin to PyObject_GetIter() in the C API, that's used in 
both Jython 2.1 and Jython 2.2; then your code will be forward and backward 
compatible without implementing both the old and the new protocols.

If there is no such API, and you want to support the 2.2 protocol, you'll 
need to hardcode both the old and new protocols, due to the fact that 
you're not coding in Python (where a simple "for" loop suffices to ensure 
portability).


>To me, the purpose of implementing the 2.2 iterator protocol is so that 
>applications and components run inside my framework will work, if they 
>support the 2.2 iterator protocol. I'm really not interested in the 
>pre-2.2 protocol at all, though I suppose I should be if people want to 
>run pre-2.2 iterable components in my framework.

If a piece of code is written for 2.2 and its iterator protocol, why do you 
think it'll work in your server at all?  It's far more likely that the only 
code you can run in your server will be code written for a 2.1 version of 
Python.  And such code, if it has an iterable at all, is going to be 
written to the old iterator protocol, because it will presumably want to be 
able to run in pre-2.2 CPython containers, too.  So, no matter what, *no* 
code is going to work in your server unless it was specifically written for 
your server: the exact opposite of the point of WSGI.

From ianb at colorstudy.com  Tue Aug 31 05:01:22 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue Aug 31 05:01:29 2004
Subject: [Web-SIG] Status code, status header
Message-ID: <4133EA02.6090301@colorstudy.com>

After a little thought, I'm -1 on a status header, even with 
email.Message.  Mostly because at some point I believe Alan asked about 
what to do about a Location header, thinking in terms of CGI behavior 
where if you don't provide the status header the server guesses -- 
either doing 200, or 304 if there's a Location header to a remote 
location, or an internal redirect otherwise.  WSGI explicitly doesn't 
allow that, but it's a clearer requirement when the application has to 
explicitly say what the status code is.  If status was a header, I think 
we'd have to deal with a situation when that header was missing.

I'm also +1 on turning status into an integer.  I think it makes things 
a little simpler, and those message strings are just a distraction.  The 
final server can put that string in ("200 OK", etc) if it wants to, but 
if it doesn't it doesn't matter.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From ianb at colorstudy.com  Tue Aug 31 05:07:51 2004
From: ianb at colorstudy.com (Ian Bicking)
Date: Tue Aug 31 05:07:57 2004
Subject: [Web-SIG] The write callable (vs. file-like object)
Message-ID: <4133EB87.7060501@colorstudy.com>

Another comment I meant to make, but forgot about amid exceptions.  The 
write callable is a bit awkward, because most code wants a file-like 
object, not a callable.  So I had to do the dumb thing of creating a 
fake instance with one "write" instance variable.  That feels silly.  I 
think I'd prefer if the return value from start_response was a file-like 
object.

Arguably, where the callable is harder to use, it's easier to produce. 
E.g., you could pass a bound method (that's not write) as the callable, 
like aList.append.  So I'm not sure about this.  OTOH, returning a 
file-like object leaves open more room for extension.  Like, the ability 
to write unicode; even if we leave it out now I don't see any good place 
where that could be added in the future, as the interface is rather 
minimal in that area.  But my thinking is a little fuzzy in that area.

-- 
Ian Bicking  /  ianb@colorstudy.com  / http://blog.ianbicking.org
From tony at lownds.com  Tue Aug 31 08:15:55 2004
From: tony at lownds.com (tony@lownds.com)
Date: Tue Aug 31 08:34:35 2004
Subject: [Web-SIG] wsgi.fatal_errors
In-Reply-To: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
Message-ID: <55924.68.122.69.79.1093932955.squirrel@*>

> Here are some changes I've proposed in the last few days to resolve issues
> people brought up, but which I haven't gotten much feedback on:
>
> * 'wsgi.fatal_errors' key for exceptions that apps and middleware
> shouldn't
> trap
>

What about defining an exception class that applications can raise with an
HTML payload, which servers are supposed to send the to the client?
Middleware should be free to alter the payload as much as they like. The
server should not send the payload when content-type is not html.

By using exceptions as a backchannel, the application and middleware do
not have to keep track of the state to sanely handle an error.

With these examples, the FormatExceptions middleware really needs to be
the "innermost" middleware. I think exception-handling middleware
independent of how it is stacked is a non-goal.

For example,

def an_application(env, start_response):
  try:
     form = read_form(env)
     html = do_work(form)
     write = start_response('200 OK', [('Content-type', 'text/html')])
     return [html]
  except:
     import cgitb
     cgitb.html
     raise env['wsgi.error_class'], cgitb.html(sys.exc_info())

...and middleware that formats the exception:

def FormatExceptions(app):
  import sys, cgitb
  def middleware(env, start_response):
    try:
      return app(env, start_response)
    except:
      raise env['wsgi.error_class'], cgitb.html(sys.exc_info())
  return middleware

...and more complicated middleware that uses this concept:

class AddContent:
  def __init__(self, app, header='', footer=''):
     self.app = app
     self.header = header
     self.footer = footer

  def __call__(self, env, start_response):
    return AddContentHandler(env, start_response, self).run()

  def add_length(self, length):
     return length + len(self.header) + len(self.footer)

class AddContentHandler:
  def __init__(self, add_content, env, start_response):
    self.env = env
    self.orig_start_response = start_response
    self.add_content = add_content
    self.written_header = False
    self.publish_extension()

  def publish_extension(self):
    self.env['wsgi.extensions'].append('add_content')
    self.env['add_content.instance'].append(add_content)

  def start_response(self, status, headers):
    self.set_headers(headers)
    self.check_content_length()
    self.orig_write = self.orig_start_response(status,
self.rebuild_headers())
    return self.write

  def write(self, data):
    if not self.written_header:
      self.orig_write(self.add_content.header)
      self.written_header = True
    return self.orig_write(data)

  def run(self):
     try:
       result = self.add_content.app(self.env, self.start_response)
     except self.env['wsgi.error_class'], e:
       # wrap exception html -- try not to duplicate header
       html = str(e)
       if self.written_header:
         self.written_header = True
         html = self. add_content.header + html
       html += self. add_content.footer
       raise self.env['wsgi.error_class'], html
     else:
       self.result = iter(result)
       return self

  def __iter__(self):
     if not self.written_header:
       self.written_header = True
       yield self.add_content.header
     for i in self.result:
       yield i
     yield self.add_content.footer

-Tony

From py-web-sig at xhaus.com  Tue Aug 31 17:16:31 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Tue Aug 31 17:15:47 2004
Subject: [Web-SIG] Iterator protocols.
In-Reply-To: <5.1.1.6.0.20040830194335.020e93c0@mail.telecommunity.com>
References: <5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com>
	<5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com>
	<5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com>
	<5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com>
	<5.1.1.6.0.20040830194335.020e93c0@mail.telecommunity.com>
Message-ID: <4134964F.4030201@xhaus.com>

Dear Phillip,

OK, now I understand what you're saying about iterators. Sorry for being 
so thick, and thanks for your patience.

More below.

[Alan Kennedy]
 >> I really am confused now by what you say. Is it possible that you're
 >> misunderstanding my approach?

[Phillip J. Eby]
 > No; your approach just isn't portable, and breaks the cross-server
 > compatibility that's the point of WSGI.  See below.

[Snip some java code posted by Alan]

[Phillip J. Eby]
 > ...this code won't work if the application returns, say, a list.  But a
 > list *would* be a perfectly valid iterable in a "normal" WSGI server or
 > gateway; therefore, this approach is broken.
 >
 > Meanwhile, an application that wants to support running in pre-2.2
 > containers *other* than yours, is now forced to implement *both* the old
 > and the new protocol!
 >
 > This is clearly broken, since there's no reason to require
 > backward-compatible application code to implement a protocol that isn't
 > implemented by the version of Python they're trying to support.

My misunderstanding was based on the fact that I mistakenly thought that 
the application object authors would always implement the 2.2 iterator 
protocol on their own objects, i.e. explicit .__iter__() and .next() 
methods, etc: I forgot that they could just return a simple python 
object, e.g. list, etc, which is of course an iterable as well.

[Phillip J. Eby]
 > .... perhaps you should check whether there is a Java API you can
 > access from Jython that's akin to PyObject_GetIter() in the C API,
 > that's used in both Jython 2.1 and Jython 2.2; then your code will be
 > forward and backward compatible without implementing both the old and
 > the new protocols.

Unfortunately not: jython 2.1 does not have such a method in the 
PyObject API. The only iterator related methods in the jython 2.1 
PyObject API are

__getitem__()
__len__()

Jython 2.2alpha does have 2.2 iterator support, i.e. all built-in 
sequence objects implement the 2.2 iterator protocol.

http://cvs.sourceforge.net/viewcvs.py/jython/jython/org/python/core/PyObject.java?rev=2.30&view=log

But jython 2.2 is unfortunately currently out-of-the-question: not 
production quality yet. And it could be a while before it becomes 
production quality. I want to create a robust jython WSGI solution for 
right now.

[Phillip J. Eby]
 > If there is no such API, and you want to support the 2.2 protocol,
 > you'll need to hardcode both the old and new protocols, due to the fact
 > that you're not coding in Python (where a simple "for" loop suffices to
 > ensure portability).

I see now that that is my only option.

Which is fine, it's not actually that much work. And I would have to do 
some of it for WSGI anyway, due to the requirements relating to 
application objects with __len__ methods, etc.

[Alan Kennedy]
 >> To me, the purpose of implementing the 2.2 iterator protocol is so
 >> that applications and components run inside my framework will work, if
 >> they support the 2.2 iterator protocol. I'm really not interested in
 >> the pre-2.2 protocol at all, though I suppose I should be if people
 >> want to run pre-2.2 iterable components in my framework.

[Phillip J. Eby]
 > If a piece of code is written for 2.2 and its iterator protocol, why do
 > you think it'll work in your server at all?

To me, the whole point of implementing the 2.2 iterator protocol under 
jython 2.1 was so that there is at least a sporting chance that 
third-party WSGI components written for cpython 2.2 will run under my 
2.1 container. I only want to do what I can to make sure that jython is 
not left behind .....

[Phillip J. Eby]
 > It's far more likely that
 > the only code you can run in your server will be code written for a 2.1
 > version of Python.

I'm hoping to maximize portability, and to minimize dependencies.

[Phillip J. Eby]
 > And such code, if it has an iterable at all, is
 > going to be written to the old iterator protocol, because it will
 > presumably want to be able to run in pre-2.2 CPython containers, too.

Well, as I mentioned above, I will attempt to explicitly support both 
the old and new iterator protocols.

Do you think other folks developing embedded (i.e. not coded in python) 
frameworks should consider the same?

[Phillip J. Eby]
 > So, no matter what, *no* code is going to work in your server unless it
 > was specifically written for your server: the exact opposite of the
 > point of WSGI.

And framework-specificity is the very thing that I want to avoid most.

Kind regards,

Alan.
From pje at telecommunity.com  Tue Aug 31 17:29:09 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Aug 31 17:28:41 2004
Subject: [Web-SIG] Status code, status header
In-Reply-To: <4133EA02.6090301@colorstudy.com>
Message-ID: <5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com>

At 10:01 PM 8/30/04 -0500, Ian Bicking wrote:
>After a little thought, I'm -1 on a status header, even with email.Message.

I think email.Message is also dead, due to its absence in Python versions 
prior to 2.2.


>I'm also +1 on turning status into an integer.  I think it makes things a 
>little simpler, and those message strings are just a distraction.  The 
>final server can put that string in ("200 OK", etc) if it wants to, but if 
>it doesn't it doesn't matter.

I'm still -1 on this, for the reasons stated previously.  I might be 
convinced if you can show me that a significant number of popular servers 
already have the necessary table(s) to do this with; e.g. Twisted, ZServer, 
Apache (CGI/FastCGI), mod_python, etc.

In theory, the "reason-phrase" can be null.  In practice, I wonder.  Also, 
I don't think the message strings are "just a distraction": they clarify 
the intent of the code that contains them.

From pje at telecommunity.com  Tue Aug 31 17:42:42 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Aug 31 17:42:11 2004
Subject: [Web-SIG] The write callable (vs. file-like object)
In-Reply-To: <4133EB87.7060501@colorstudy.com>
Message-ID: <5.1.1.6.0.20040831112916.0356d740@mail.telecommunity.com>

At 10:07 PM 8/30/04 -0500, Ian Bicking wrote:
>Another comment I meant to make, but forgot about amid exceptions.  The 
>write callable is a bit awkward, because most code wants a file-like 
>object, not a callable.

What other file-like properties do you want it to have?

Keep in mind that the average response should be sent by calling 'write()' 
at most *once*, to write the entire page content, buffering the output of 
some template.  'write()' imposes a potentially high synchronization cost 
that reduces throughput if it's overused.  It should *not* be used as the 
target of output from any kind of page template.  Application frameworks 
should buffer template output (e.g. to a StringIO) and then either 
'write()' or yield the result.

Multiple calls to 'write()' are for streaming output only, such as each 
segment of a multipart server push, or for supporting frameworks that can't 
work any other way.  I guess I need to beef up the parts that say this.

The preferred mechanism for generating WSGI output is via the iterable 
return value, as it allows the maximum concurrency and throughput for the 
server.  If we didn't need it for backward-compatibility with existing 
frameworks, 'write()' and 'start_response()' simply wouldn't exist, and the 
status and headers would be part of the return value as well.


>Like, the ability to write unicode; even if we leave it out now I don't 
>see any good place where that could be added in the future, as the 
>interface is rather minimal in that area.  But my thinking is a little 
>fuzzy in that area.

If Python currently had a "byte array" type, we'd be using that instead of 
strings.  Direct writing of Unicode isn't intended to ever be directly 
supported by the standard, although in principle you could create some kind 
of "encoding middleware" that sits directly atop the application.  (An 
application or framework written to it would technically not be 
WSGI-compliant.)

I guess I need to add something about byte arrays to the spec, especially 
since Java/Jython may have this issue today (i.e. strings are Unicode, but 
for HTTP a byte array is needed).

From amk at amk.ca  Tue Aug 31 17:43:44 2004
From: amk at amk.ca (A.M. Kuchling)
Date: Tue Aug 31 17:44:19 2004
Subject: [Web-SIG] Status code, status header
In-Reply-To: <5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com>
References: <4133EA02.6090301@colorstudy.com>
	<5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com>
Message-ID: <20040831154344.GA17594@rogue.amk.ca>

On Tue, Aug 31, 2004 at 11:29:09AM -0400, Phillip J. Eby wrote:
> At 10:01 PM 8/30/04 -0500, Ian Bicking wrote:
> >After a little thought, I'm -1 on a status header, even with email.Message.
> 
> I think email.Message is also dead, due to its absence in Python versions 
> prior to 2.2.

Do note that rfc822.py is on the road to deprecation, presumably in
favour of email.Message.  If email.Message has problems, therefore,
you should try to fix them.

--amk
From pje at telecommunity.com  Tue Aug 31 17:47:51 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Aug 31 17:47:19 2004
Subject: [Web-SIG] Iterator protocols.
In-Reply-To: <4134964F.4030201@xhaus.com>
References: <5.1.1.6.0.20040830194335.020e93c0@mail.telecommunity.com>
	<5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com>
	<5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com>
	<5.1.1.6.0.20040829222301.0227cc10@mail.telecommunity.com>
	<5.1.1.6.0.20040830161318.021e7ec0@mail.telecommunity.com>
	<5.1.1.6.0.20040830194335.020e93c0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040831114403.0356da10@mail.telecommunity.com>

At 04:16 PM 8/31/04 +0100, Alan Kennedy wrote:
>[Phillip J. Eby]
> > And such code, if it has an iterable at all, is
> > going to be written to the old iterator protocol, because it will
> > presumably want to be able to run in pre-2.2 CPython containers, too.
>
>Well, as I mentioned above, I will attempt to explicitly support both the 
>old and new iterator protocols.
>
>Do you think other folks developing embedded (i.e. not coded in python) 
>frameworks should consider the same?

I don't think this is going to be an issue anywhere else; AFAIK any other 
non-CPython target will have 2.2 iterator support built-in.  For CPython 
2.2 and up, 'PyObject_GetIter()' will do.  If somebody needs to support 
earlier versions, they should just implement the old iterator protocol.  It 
doesn't make any sense to try to support CPython 2.1 objects implementing a 
CPython 2.2 protocol, the special case of Jython notwithstanding.

From pje at telecommunity.com  Tue Aug 31 17:55:24 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Aug 31 17:54:51 2004
Subject: [Web-SIG] Status code, status header
In-Reply-To: <20040831154344.GA17594@rogue.amk.ca>
References: <5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com>
	<4133EA02.6090301@colorstudy.com>
	<5.1.1.6.0.20040831111543.01ed9080@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040831115305.0356e440@mail.telecommunity.com>

At 11:43 AM 8/31/04 -0400, A.M. Kuchling wrote:
>On Tue, Aug 31, 2004 at 11:29:09AM -0400, Phillip J. Eby wrote:
> > At 10:01 PM 8/30/04 -0500, Ian Bicking wrote:
> > >After a little thought, I'm -1 on a status header, even with 
> email.Message.
> >
> > I think email.Message is also dead, due to its absence in Python versions
> > prior to 2.2.
>
>Do note that rfc822.py is on the road to deprecation, presumably in
>favour of email.Message.  If email.Message has problems, therefore,
>you should try to fix them.

It doesn't actually have any serious problems w/respect to WSGI usage, just 
stuff we don't need.

However, despite our change to a 2.2.2 version target, Jython has since 
then emerged as a use case, so I believe we're moving back to at least a 
2.1 version target.  IIRC, 'email.Message' isn't available in 2.1.

Anyway, the alternative is "list of (name,value) tuples", not anything from 
the rfc822 module.

From pje at telecommunity.com  Tue Aug 31 18:11:03 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Aug 31 18:10:49 2004
Subject: [Web-SIG] wsgi.fatal_errors
In-Reply-To: <55924.68.122.69.79.1093932955.squirrel@*>
References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
	<5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com>

At 11:15 PM 8/30/04 -0700, tony@lownds.com wrote:
> > Here are some changes I've proposed in the last few days to resolve issues
> > people brought up, but which I haven't gotten much feedback on:
> >
> > * 'wsgi.fatal_errors' key for exceptions that apps and middleware
> > shouldn't
> > trap
> >
>
>What about defining an exception class that applications can raise with an
>HTML payload, which servers are supposed to send the to the client?
>Middleware should be free to alter the payload as much as they like. The
>server should not send the payload when content-type is not html.
>
>By using exceptions as a backchannel, the application and middleware do
>not have to keep track of the state to sanely handle an error.

Interesting.  But I think you've just given me an idea for a possibly 
simpler way to do this, with some other advantages.

Suppose that instead of 'start_response(status,headers)' we had 
'set_response(status,headers,body=None)'.  And the difference would be that 
our 'set_response' does nothing until/unless you call write() or yield a 
result from the return iterable.  Therefore, you could call 'set_response' 
multiple times, with only the last such call taking effect.  (If you supply 
a non-None 'body', then calling write() or returning an iterable is an error.)

Now consider error handling middleware: it simply calls 
'set_response(error_status,error_headers,error_body)', and returns None.

At this point, we've isolated the complexity to exist only for streaming 
responses once the first body chunk has been generated.  We can handle this 
by making a call to 'set_response()' a fatal error if a body chunk has been 
generated.  Thus, no special handling is needed by an exception handler: it 
just tries to do 'set_response()', and allows the fatal error (if any) to 
propagate.  Now, the server can catch the fatal error and deal with it.

I think this will let us keep all of the complications in the server, where 
they always have to exist, no matter what else we do.  Exception-handling 
middleware is then delightfully simple.

On the other hand, output-transforming middleware becomes somewhat more 
complex, as it would now have three output sources to transform (body param 
to set_response(), write(), and output iterable).

This is a fairly significant change to the spec, that introduces lots of 
new angles to cover.  But, I think it could be an "exceptionally" clean 
solution to the problem.  ;)

From py-web-sig at xhaus.com  Tue Aug 31 19:35:43 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Tue Aug 31 19:31:11 2004
Subject: [Web-SIG] The write callable (vs. file-like object)
In-Reply-To: <5.1.1.6.0.20040831112916.0356d740@mail.telecommunity.com>
References: <5.1.1.6.0.20040831112916.0356d740@mail.telecommunity.com>
Message-ID: <4134B6EF.2010708@xhaus.com>

[Phillip J. Eby]
 > If Python currently had a "byte array" type, we'd be using that instead
 > of strings.  Direct writing of Unicode isn't intended to ever be
 > directly supported by the standard, although in principle you could
 > create some kind of "encoding middleware" that sits directly atop the
 > application.  (An application or framework written to it would
 > technically not be WSGI-compliant.)
 >
 > I guess I need to add something about byte arrays to the spec,
 > especially since Java/Jython may have this issue today (i.e. strings are
 > Unicode, but for HTTP a byte array is needed).

Hmmm: looking under the jython covers, I think there is no problem with 
binary strings.

org.python.core.PyFile implements the write method for *binary* data by 
transcoding the Unicode string using the 
java.lang.String.getBytes(int,int,byte[],int) method (which is 
deprecated because it doesn't transcode unicode characters properly).

http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#getBytes(int,%20int,%20byte[],%20int)

The javadoc says: "Copies characters from this string into the 
destination byte array. Each byte receives the 8 low-order bits of the 
corresponding character. The eight high-order bits of each character are 
not copied and do not participate in the transfer in any way."

Which, AFAICT, is not a problem, because (I'm presuming) jython stores 
binary data as one byte per character of a string, i.e. the low byte. So 
the above transcoding would be fine, when you're dealing with bytes, not 
actual characters.

When the output is *character* data (i.e. the "if (binary)" clause is 
false, see below), the java.lang.String.getBytes() method is used, which 
transcodes properly to bytes, according to the "platform's default 
charset", which is set at JVM startup time.

http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#getBytes()

If anyone is interested, here is the code for the 
PyFile.getBytes(String) method, called by PyFile.write().

protected byte[] getBytes(String s)
   {
   // Yes, I known the method is depricated, but it is the fastest
   // way of converting between between byte[] and String
   if (binary)
     {
     byte[] buf = new byte[s.length()];
     s.getBytes(0, s.length(), buf, 0);
     return buf;
     }
   else
     return s.getBytes();
   }

So, I think all is well here: jython knows how to properly manage byte 
strings vs. python strings.

Regards,

Alan.

P.S. The spelling mistakes in the code comments above are verbatim from 
the jython 2.1 codebase. All other speeling misteaks are my own ;-)

From py-web-sig at xhaus.com  Tue Aug 31 19:50:31 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Tue Aug 31 19:49:13 2004
Subject: [Web-SIG] wsgi.fatal_errors
In-Reply-To: <5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com>
References: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>	<5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
	<5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com>
Message-ID: <4134BA67.6020603@xhaus.com>

[Phillip J. Eby]
> This is a fairly significant change to the spec, that introduces lots of 
> new angles to cover.  But, I think it could be an "exceptionally" clean 
> solution to the problem.  ;)

+1 on changing the spec until it's perfect, or as close as possible to same.

But I'm trying to manage an implementation here as well.

It would be nice if we could have a simple versioning scheme on the 
spec, i.e. a date string or a version label, which I could use as a 
tag/label in my versioning system. Maybe a change history as well?

Just a suggestion. No problem if it's considered too much hassle.

Kind regards,

Alan.

From py-web-sig at xhaus.com  Tue Aug 31 21:01:24 2004
From: py-web-sig at xhaus.com (Alan Kennedy)
Date: Tue Aug 31 20:56:51 2004
Subject: [Web-SIG] Returned application object and fileno.
Message-ID: <4134CB04.2010803@xhaus.com>

Dear Sig,

Currently the spec says that the application can return an object which 
has a callable fileno attribute, which can return a file descriptor.

The current wording is "If the returned iterable has a fileno attribute, 
the server may assume that this is a fileno() method returning an 
operating system file descriptor, and that it is allowed to read 
directly from that descriptor up to the end of the file, and/or use any 
appropriate operating system facilities (e.g. the sendfile() system 
call) to transmit the file's contents. If the server does this, it must 
begin transmission with the file's current position, and end at the end 
of the file."

Problem is that jython doesn't support file descriptors, or the fileno() 
method. If you invoke fileno() on an org.python.core.PyFile, you get an 
Py.IOError("fileno() is not supported in jpython") exception.

Is there any more portable way that we can detect the application 
returning a file(-like object)?

Maybe checking type(app_object) == types.FileType?

Or checking if the object has a read() method?

I can imagine that a similar problem may arise later with IronPython on 
the MS CLR, which I believe doesn't use file descriptors either: like 
java, it is stream based.

Regards,

Alan.
From pje at telecommunity.com  Tue Aug 31 21:05:23 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Aug 31 21:04:59 2004
Subject: [Web-SIG] wsgi.fatal_errors
In-Reply-To: <4134BA67.6020603@xhaus.com>
References: <5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com>
	<5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
	<5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
	<5.1.1.6.0.20040831114934.038e1c80@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040831150321.02820e50@mail.telecommunity.com>

At 06:50 PM 8/31/04 +0100, Alan Kennedy wrote:
>[Phillip J. Eby]
>>This is a fairly significant change to the spec, that introduces lots of 
>>new angles to cover.  But, I think it could be an "exceptionally" clean 
>>solution to the problem.  ;)
>
>+1 on changing the spec until it's perfect, or as close as possible to same.
>
>But I'm trying to manage an implementation here as well.
>
>It would be nice if we could have a simple versioning scheme on the spec, 
>i.e. a date string or a version label, which I could use as a tag/label in 
>my versioning system. Maybe a change history as well?

There's a "Last Modified" header:
http://www.python.org/peps/pep-0333.html

And a revision history:
http://cvs.sourceforge.net/viewcvs.py/python/python/nondist/peps/pep-0333.txt

Note that both of these will be slightly out of sync with the "real" Python 
CVS, as both are updated by cronjobs.

From pje at telecommunity.com  Tue Aug 31 23:21:01 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Aug 31 23:20:30 2004
Subject: [Web-SIG] Re: Pending modifications to PEP 333
In-Reply-To: <20040830193840.6b12ae9b.ods@strana.ru>
References: <5.1.1.6.0.20040830093138.037788d0@mail.telecommunity.com>
	<4132A151.1000006@colorstudy.com>
	<5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
	<4132A151.1000006@colorstudy.com>
	<5.1.1.6.0.20040830093138.037788d0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040831171705.0238f250@mail.telecommunity.com>

At 07:38 PM 8/30/04 +0400, Denis S. Otkidach wrote:
>On Mon, 30 Aug 2004 09:33:14 -0400
>"Phillip J. Eby" <pje@telecommunity.com> wrote:
>
> > >Below is a class we use for headers in our framework for several years.
> > >I guess it's more comfortable than list of tuples or email.Message.
> > >Anyway, we have to fix only "must have" interface, but not all useful
> > >methods.
> >
> > Hi Denis; thanks for the input.  Unfortunately, WSGI needs to either use a
> > class/type that's available in the Python standard library, or else a
> > simple protocol like "sequence of name,value pairs".
>
>"sequence of name,value pairs" is OK - my class satisfies this interface if
>you mean just iterable object when saying "sequence", and not real list.

As it happens, the current spec is ambiguous: it says both "list" and 
"sequence" in different places.  I've standardized it to be "list", as in 
'type(headers) is ListType'.  This means your approach will require you to 
call 'list(myHeadersObject)', but it will allow middleware to manipulate 
the list in-place using boilerplate routines.

From pje at telecommunity.com  Tue Aug 31 23:56:11 2004
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Aug 31 23:55:40 2004
Subject: [Web-SIG] Pending modifications to PEP 333
In-Reply-To: <5.1.1.6.0.20040829231908.02293ec0@mail.telecommunity.com>
Message-ID: <5.1.1.6.0.20040831173043.023a93e0@mail.telecommunity.com>

I'm just about to check in a major update to the PEP, per the details 
below.  It will be a while before it shows up in the HTML version of the 
PEP or the sourceforge ViewCVS, though.


At 11:25 PM 8/29/04 -0400, Phillip J. Eby wrote:
>Here are some changes I've proposed in the last few days to resolve issues 
>people brought up, but which I haven't gotten much feedback on:
>
>* 'wsgi.fatal_errors' key for exceptions that apps and middleware 
>shouldn't trap
>
>* 'wsgi.auth_available' flag

I've added these to the "Open Issues" section now


>* Make the 'headers' object an 'email.Message' (well, there's been some 
>feedback, but I think I addressed the concerns, and there was no feedback 
>since)

...and removed this, because it's effectively dead due to lack of popular 
support, added annoyances, and the need to support pre-2.2 versions of 
Python.  However, I've updated the spec to be unambiguous in requiring a 
*list* of header tuples, so that middleware and servers can modify the 
headers in place using boilerplate routines, if desired.


>* what should a server or gateway's default error handling be, for each of 
>the eight contexts in which an exception can occur?

Added to open issues.


>* notes on writing pre-2.2 compatible iteration code

Completed and added to the PEP.


>* anything else?

The application object must now *always* return an iterable; 'None' is no 
longer a valid return value.  This simplifies server logic and helps 
encourage the use of an iterable.  Also, it's now explicit that the server 
must not try to use any attributes of the iterable not explicitly mentioned 
by the PEP (e.g. 'read()' is a no-no).

I've also clarified that 'fileno()', if present, *must* be an OS file 
descriptor, and is only relevant to servers on platforms where file 
descriptors exist.

I've also done a significant edit to further clarify that the 'write()' 
callable is a backward compatibility hack, and isn't intended to be used 
unless you really, really need it.  I've also significantly clarified the 
issues surrounding buffering and streaming.

I also refactored the examples to be more compliant with the spec's 
intentions and to be more explanatory/exemplary of desirable behaviors.

Last, but not least, the language regarding a server modifying or deleting 
application-supplied headers has been clarified to restrict its 
applicability to connection-management headers, and to clarify where any 
replaced or deleted headers should be recorded.