From mnot at mnot.net  Wed Sep  5 04:04:15 2007
From: mnot at mnot.net (Mark Nottingham)
Date: Wed, 5 Sep 2007 12:04:15 +1000
Subject: [Web-SIG] Chunked Tranfer encoding on request content.
In-Reply-To: <1173050906.11628@dscpl.user.openhosting.com>
References: <1173050906.11628@dscpl.user.openhosting.com>
Message-ID: <F7D8A81A-C47D-4A34-936A-14494852ABE1@mnot.net>

Are you actually seeing chunked request bodies in the wild? If so,  
from what UAs?

IME they're not very common, because of lack of support in most  
servers, and some interop issues with proxies (IIRC).

Cheers,


On 05/03/2007, at 10:28 AM, Graham Dumpleton wrote:

> The WSGI specification doesn't really say much about chunked  
> transfer encoding
> for content sent within the body of a request. The only thing that  
> appears to
> apply is the comment:
>
>   WSGI servers must handle any supported inbound "hop-by-hop"  
> headers on their
>   own, such as by decoding any inbound Transfer-Encoding, including  
> chunked
>   encoding if applicable.
>
> What does this really mean in practice though?
>
> As a means of getting feedback on what is the correct approach I'll  
> go through
> how the CherryPy WSGI server handles it. The problem is that the  
> CherryPy
> approach raises a few issues which makes me wander if it is doing  
> it in the
> most appropriate way.
>
> In CherryPy, when it sees that the Transfer-Encoding is set to  
> 'chunked' while
> parsing the HTTP headers, it will at that point, even before it has  
> called
> start_response for the WSGI application, read in all content from  
> the body of
> the request.
>
> CherryPy reads in the content like this for two reasons. The first  
> is so that
> it can then determine the overall length of the content that was  
> available and
> set the CONTENT_LENGTH value in the WSGI environ. The second reason  
> is so that
> it can read in any additional HTTP header fields that may occur in  
> the trailer
> after the last data chunk and also incorporate them into the WSGI  
> environ.
>
> The first issue with what it does is that it has read in all the  
> content. This denies
> a WSGI application the ability to stream content from the body of a  
> request and
> process it a bit at a time. If the content is huge, that it buffers  
> it can also mean
> the application process size will grow significantly.
>
> The second issue, although I am confused on whether the CherryPy  
> WSGI server
> actually implements this correctly, is that if the client was  
> expecting to see a
> 100 continue response, this will need to be sent back to the client  
> before any
> content can be read. When chunked transfer encoding is not used,  
> such a 100
> continue response would in a good WSGI server only be sent when the  
> WSGI
> application called read() on wsgi.input for the first time. Ie.,  
> the 100 continue
> indicates that the application which is consuming the data is  
> actually ready to
> start processing it. What CherryPy WSGI server is doing is  
> circumventing that and
> the client could think the final consumer application is ready  
> before it actually is.
>
> Note that I am assuming here that 100 continue is still usable in  
> conjunction
> with chunked transfer encoding. In CherryPy WSGI server it only  
> actually sends
> the 100 continue after it attempts to try and read content in the  
> presence of a
> chunked transfer encoding header. Not sure if this is actually a  
> bug or not.
>
> CherryPy WSGI server also doesn't wait until first read() by WSGI  
> application
> before sending back the 100 continue either and instead sends it as  
> soon as the
> headers are parsed. This may be fine, but possibly not most optimal  
> as it denies
> an application the ability to fail a request and avoid a client  
> sending the
> actual content.
>
> Now, to my mind, the preferred approach would be that the content  
> would not
> be read up front like this and instead CONTENT_LENGTH would simply  
> be unset
> in the WSGI environ.
>
>> From prior discussions related to input filtering on the list, a WSGI
> application shouldn't really be paying much attention to  
> CONTENT_LENGTH anyway
> and should just be using read() to get data until it returns an  
> empty string.
> Thus, for chunked data, that it doesn't know the content length up  
> front
> shouldn't matter as it should just call read() until there is no  
> more. BTW, it may
> not be this simple for something like a proxy, but that is a  
> discussion for another
> time.
>
> Doing this also means that the 100 continue only gets sent when the  
> application
> is ready and there is no need to for the content to be buffered up.
>
> That it is the actual application which is consuming the data and  
> not some
> intermediary means that an application could implement some  
> mechanism whereby
> it reads some data, acts on that and starts sending some data in  
> response. The
> client then might send more data based on that response which the  
> application
> only then reads, send more data as response etc. Thus an end to end
> communication stream can be established where the actual overall  
> content length
> of the request could never be established up front.
>
> The only problem with deferring any reading of data to when the  
> application
> wants to actually read it, is that if the overall length of content  
> in the request
> is bounded, there is no way to get access to the additional headers  
> in the trailer
> of the request and have them available in the WSGI environ since  
> processing of
> the WSGI environ has already occurred before any data was read.
>
> So, what gives. What should a WSGI server do for chunked transfer  
> encoding on
> a request?
>
> I may not totally understand 100 continue and chunked transfer  
> encoding and
> am happy to be correct in my understanding of them, but what  
> CherryPy WSGI
> server does doesn't seem right to me at first look.
>
> Graham
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/mnot% 
> 40mnot.net


--
Mark Nottingham     http://www.mnot.net/


From mnot at mnot.net  Wed Sep  5 03:44:28 2007
From: mnot at mnot.net (Mark Nottingham)
Date: Wed, 5 Sep 2007 11:44:28 +1000
Subject: [Web-SIG] proxy-connection header in wsgiref.utils.is_hop_by_hop
In-Reply-To: <8E587901-CA61-4E1A-BC78-7675332EEC62@osafoundation.org>
References: <8E587901-CA61-4E1A-BC78-7675332EEC62@osafoundation.org>
Message-ID: <791D04CE-5CCF-4EED-A703-48D0114E6032@mnot.net>

Think so...

(belatedly)


On 22/11/2006, at 8:50 AM, Mikeal Rogers wrote:

> I implemented a functional proxy last night using wsgiref and I
> noticed that 'proxy-connection' isn't in the _hoppish dict.
>
> Although proxy-connection isn't in the HTTP 1.1 spec as hop-by-hop it
> is an anomalous HTTP 1.0 header that is treated as a hop-by-hop
> header in most cases and is removed by the proxy when fulfilling
> requests.
>
> Should it be added to the _hoppish dict?
>
> -Mikeal
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/mnot% 
> 40mnot.net


--
Mark Nottingham     http://www.mnot.net/


From graham.dumpleton at gmail.com  Wed Sep  5 13:55:14 2007
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Wed, 5 Sep 2007 21:55:14 +1000
Subject: [Web-SIG] Chunked Tranfer encoding on request content.
In-Reply-To: <F7D8A81A-C47D-4A34-936A-14494852ABE1@mnot.net>
References: <1173050906.11628@dscpl.user.openhosting.com>
	<F7D8A81A-C47D-4A34-936A-14494852ABE1@mnot.net>
Message-ID: <88e286470709050455n181f2fecqfd1181a740e3683d@mail.gmail.com>

On 05/09/07, Mark Nottingham <mnot at mnot.net> wrote:
> Are you actually seeing chunked request bodies in the wild? If so,
> from what UAs?
>
> IME they're not very common, because of lack of support in most
> servers, and some interop issues with proxies (IIRC).

It has come up as an issue on mod_python list a couple of times. Agree
though that it isn't common. From memory the people were using custom
user agents designed for a special purpose.

Just because it isn't common doesn't mean that an attempt shouldn't be
made to support it, especially if it is part of the HTTP standard.

Also, the same solution for handling this would also be applicable in
cases where mutating input filters are used which change the length of
the request content but are unable to update the content length
header. Thus, like with chunked encoding, a way is needed in this
circumstance to indicate that there is content, but the length isn't
known.

Graham

> On 05/03/2007, at 10:28 AM, Graham Dumpleton wrote:
>
> > The WSGI specification doesn't really say much about chunked
> > transfer encoding
> > for content sent within the body of a request. The only thing that
> > appears to
> > apply is the comment:
> >
> >   WSGI servers must handle any supported inbound "hop-by-hop"
> > headers on their
> >   own, such as by decoding any inbound Transfer-Encoding, including
> > chunked
> >   encoding if applicable.
> >
> > What does this really mean in practice though?
> >
> > As a means of getting feedback on what is the correct approach I'll
> > go through
> > how the CherryPy WSGI server handles it. The problem is that the
> > CherryPy
> > approach raises a few issues which makes me wander if it is doing
> > it in the
> > most appropriate way.
> >
> > In CherryPy, when it sees that the Transfer-Encoding is set to
> > 'chunked' while
> > parsing the HTTP headers, it will at that point, even before it has
> > called
> > start_response for the WSGI application, read in all content from
> > the body of
> > the request.
> >
> > CherryPy reads in the content like this for two reasons. The first
> > is so that
> > it can then determine the overall length of the content that was
> > available and
> > set the CONTENT_LENGTH value in the WSGI environ. The second reason
> > is so that
> > it can read in any additional HTTP header fields that may occur in
> > the trailer
> > after the last data chunk and also incorporate them into the WSGI
> > environ.
> >
> > The first issue with what it does is that it has read in all the
> > content. This denies
> > a WSGI application the ability to stream content from the body of a
> > request and
> > process it a bit at a time. If the content is huge, that it buffers
> > it can also mean
> > the application process size will grow significantly.
> >
> > The second issue, although I am confused on whether the CherryPy
> > WSGI server
> > actually implements this correctly, is that if the client was
> > expecting to see a
> > 100 continue response, this will need to be sent back to the client
> > before any
> > content can be read. When chunked transfer encoding is not used,
> > such a 100
> > continue response would in a good WSGI server only be sent when the
> > WSGI
> > application called read() on wsgi.input for the first time. Ie.,
> > the 100 continue
> > indicates that the application which is consuming the data is
> > actually ready to
> > start processing it. What CherryPy WSGI server is doing is
> > circumventing that and
> > the client could think the final consumer application is ready
> > before it actually is.
> >
> > Note that I am assuming here that 100 continue is still usable in
> > conjunction
> > with chunked transfer encoding. In CherryPy WSGI server it only
> > actually sends
> > the 100 continue after it attempts to try and read content in the
> > presence of a
> > chunked transfer encoding header. Not sure if this is actually a
> > bug or not.
> >
> > CherryPy WSGI server also doesn't wait until first read() by WSGI
> > application
> > before sending back the 100 continue either and instead sends it as
> > soon as the
> > headers are parsed. This may be fine, but possibly not most optimal
> > as it denies
> > an application the ability to fail a request and avoid a client
> > sending the
> > actual content.
> >
> > Now, to my mind, the preferred approach would be that the content
> > would not
> > be read up front like this and instead CONTENT_LENGTH would simply
> > be unset
> > in the WSGI environ.
> >
> >> From prior discussions related to input filtering on the list, a WSGI
> > application shouldn't really be paying much attention to
> > CONTENT_LENGTH anyway
> > and should just be using read() to get data until it returns an
> > empty string.
> > Thus, for chunked data, that it doesn't know the content length up
> > front
> > shouldn't matter as it should just call read() until there is no
> > more. BTW, it may
> > not be this simple for something like a proxy, but that is a
> > discussion for another
> > time.
> >
> > Doing this also means that the 100 continue only gets sent when the
> > application
> > is ready and there is no need to for the content to be buffered up.
> >
> > That it is the actual application which is consuming the data and
> > not some
> > intermediary means that an application could implement some
> > mechanism whereby
> > it reads some data, acts on that and starts sending some data in
> > response. The
> > client then might send more data based on that response which the
> > application
> > only then reads, send more data as response etc. Thus an end to end
> > communication stream can be established where the actual overall
> > content length
> > of the request could never be established up front.
> >
> > The only problem with deferring any reading of data to when the
> > application
> > wants to actually read it, is that if the overall length of content
> > in the request
> > is bounded, there is no way to get access to the additional headers
> > in the trailer
> > of the request and have them available in the WSGI environ since
> > processing of
> > the WSGI environ has already occurred before any data was read.
> >
> > So, what gives. What should a WSGI server do for chunked transfer
> > encoding on
> > a request?
> >
> > I may not totally understand 100 continue and chunked transfer
> > encoding and
> > am happy to be correct in my understanding of them, but what
> > CherryPy WSGI
> > server does doesn't seem right to me at first look.
> >
> > Graham
> > _______________________________________________
> > Web-SIG mailing list
> > Web-SIG at python.org
> > Web SIG: http://www.python.org/sigs/web-sig
> > Unsubscribe: http://mail.python.org/mailman/options/web-sig/mnot%
> > 40mnot.net
>
>
> --
> Mark Nottingham     http://www.mnot.net/
>
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com
>

From pywebsig at xhaus.com  Wed Sep  5 21:53:00 2007
From: pywebsig at xhaus.com (Alan Kennedy)
Date: Wed, 05 Sep 2007 20:53:00 +0100
Subject: [Web-SIG] Modjy and jython 2.2.
Message-ID: <46DF091C.9030406@xhaus.com>

Dear all,

Now that jython 2.2 has been released (hooray!)

http://www.jython.org/Project/download.html

it's time for a quick update on the status of modjy, the jython 
WSGI/J2EE gateway.

http://www.xhaus.com/modjy/

Previous versions of modjy were based on jython 2.1, which didn't have 
support for the iterator protocol. However, the new jython 2.2 has full 
iterator and generator support, and so is capable of full WSGI support 
(round of applause for the hard work of the jython-dev team).

In a testament to the stability of jython and the clean design of WSGI, 
the modjy code has not changed; the original jython 2.1 version of modjy 
works seamlessly with jython 2.2, unmodified.

Still, I am making an interim release, for two purposes

1. To fix a longstanding bug in the implementation
2. To explicitly mention jython 2.2 in the documentation

I'm off on vacation soon, and wanted to make this small "publicity 
release" before I go.

When I return, I will be making the following modifications

1. Adding a full test suite, based on MockRunner, the mock Java Servlet 
framework.
2. Improving J2EE resource handling
3. Improving import handling
4. Various small improvements and documentation updates.

All the best,

Alan.

From manlio_perillo at libero.it  Sat Sep 15 18:47:56 2007
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Sat, 15 Sep 2007 18:47:56 +0200
Subject: [Web-SIG] [ANN] nginx wsgi module draft
Message-ID: <46EC0CBC.9000805@libero.it>

Hi all, this my first post here.

I'm pleased to announce the availability of the first working draft of
the wsgi module for nginx.

nginx (http://nginx.net/) is a high performance open source web server
written by Igor Sysoev.

The code is available as a Mercurial repository:
http://hg.mperillo.ath.cx/nginx/mod_wsgi/

It is possible to download a snapshot in gzip format:
http://hg.mperillo.ath.cx/nginx/mod_wsgi/archive/tip.tar.gz

I have not yet defined a tag, but I'm tring to commit only full working
code.

There is still a lot of work to do and the current implementation has
some limitations (some for design choices, others for keeping the
implementation as simple as possible, at least for the first versions):

1) I have implemented the WSGI 2.0 draft
    (http://wsgi.org/wsgi/WSGI_2.0)
2) "app_iter" must produce only one item
3) Exceptions are not logged
4) Only one WSGI app can be executed, for a given nginx instance
5) SCRIPT_NAME e PATH_INFO environment variables are not supported
    (I have asked for help some messages ago)


IMPORTANT: Since the WSGI application is executed in the nginx process
            cycle, it can block the entire server.

The documentation is still missing, but you can use the documentation
that came with Apache mod_wsgi:
http://code.google.com/p/modwsgi/wiki/ConfigurationDirectives

nginx mod_wsgi additionally supports the directive "wsgi_param", similar
to fastcgi_param:
http://wiki.codemongers.com/NginxHttpFcgiModule#fastcgi_param

In conf/wsgi_params there are defined tha params required by WSGI.

In example/ there is a small example

To compile nginx, it suffice to do something like:
./configure --add-module=/home/manlio/projects/hg/nginx/mod_wsgi/ \
             --with-debug
make
make install

from the main directory of the nginx sources (I'm using version 0.5.31).

By default nginx is installed in /usr/local/nginx, I suggest to choose
as --prefix path a directory where you have write access as a limited
user, so you can install and execute nginx withoud being root.

I would like to thank Igor Sysoev for having released nginx with an Open
Source license, and for having written such a good code (although I
would like to see more documentation :-)).

Many thanks to Evan Miller for his Guide To Nginx Module Development.

And many thanks to Graham Dumpleton (the author of the WSGI module
implementation for Apache) since his module has been an invaluable
resource for me.
ngx_http_wsgi_module contains some pieces of code taken from mod_wsgi.c.


I hope that nginx wsgi module will help to make WSGI more "asyncronous
app" friendly, since the current implementation seems to not address web
servers like nginx.


Manlio Perillo


From manlio_perillo at libero.it  Tue Sep 18 13:39:23 2007
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Tue, 18 Sep 2007 13:39:23 +0200
Subject: [Web-SIG] WSGI and asyncronous applications support
Message-ID: <46EFB8EB.4000903@libero.it>

Hi.

I have read some old posts about asynchronous applications support in WSGI:

http://comments.gmane.org/gmane.comp.python.twisted.web/2561?set_lines=100000
http://comments.gmane.org/gmane.comp.python.twisted.web/632?set_lines=100000


Since nginx *does* not supports threads, it is important to add 
extensions to WSGI to support asynchronous applications.


There are two solutions:
1) The applications returns something like environ['wsgi_not_yet_done']
    and produce its output using the write callable, and the finish
    callable when its done
2) The applications calls resume = environ['wsgi.pause_output'](), yield
    an empty strings, and then calls resume() to notify the server that
    it can call .next() again


1) should be very simple to implement, and it is easy to understand how 
to use it.

As an example, we can use some API that calls a callback function when 
the result is available:
conn.execute("SELECT * FROM test", query_callback)

def query_callback(row):
    write(row[...])


2) Can be implemented in mod_wsgi, however my problem is that I can't 
figure out how the application can yield some data available after a 
callback is called.


Thanks  Manlio Perillo

From exarkun at divmod.com  Tue Sep 18 14:40:57 2007
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Tue, 18 Sep 2007 08:40:57 -0400
Subject: [Web-SIG] WSGI and asyncronous applications support
In-Reply-To: <46EFB8EB.4000903@libero.it>
Message-ID: <20070918124057.8162.665308344.divmod.quotient.10796@ohm>

On Tue, 18 Sep 2007 13:39:23 +0200, Manlio Perillo <manlio_perillo at libero.it> wrote:
> [snip]
>
>1) should be very simple to implement, and it is easy to understand how
>to use it.
>
>As an example, we can use some API that calls a callback function when
>the result is available:
>conn.execute("SELECT * FROM test", query_callback)
>
>def query_callback(row):
>    write(row[...])
>
>
>2) Can be implemented in mod_wsgi, however my problem is that I can't
>figure out how the application can yield some data available after a
>callback is called.
>

I think you figured it out already, actually:

    def app(...):
        conn.execute("SELECT * FROM test", query_callback)
        # indicate not-done-yet, however

    def query_callback(...):
        write(...)
        conn.execute("SELECT * FROM test2", another_callback)

    def another_callback(...):
        write(...)
        finish()

If you can have one callback, then there's no reason you shouldn't be
able to have an arbitrary number of callbacks.

Of course, this could also be expressed in a less error prone manner:

    def app(...):
        test_deferred = conn.execute("SELECT * FROM test")
        test_deferred.addCallback(query_callback)
        return test_deferred

    def query_callback(...):
        write(...)
        test2_deferred = conn.execute("SELECT * FROM test2")
        test2_deferred.addCallback(another_callback)
        return test2_deferred

    def another_callback(...):
        write(...)

  ;)

Jean-Paul

From manlio_perillo at libero.it  Tue Sep 18 16:44:46 2007
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Tue, 18 Sep 2007 16:44:46 +0200
Subject: [Web-SIG] WSGI and asyncronous applications support
In-Reply-To: <20070918124057.8162.665308344.divmod.quotient.10796@ohm>
References: <20070918124057.8162.665308344.divmod.quotient.10796@ohm>
Message-ID: <46EFE45E.5010306@libero.it>

Jean-Paul Calderone ha scritto:
> On Tue, 18 Sep 2007 13:39:23 +0200, Manlio Perillo <manlio_perillo at libero.it> wrote:
>> [snip]
>>
>> 1) should be very simple to implement, and it is easy to understand how
>> to use it.
>>
>> As an example, we can use some API that calls a callback function when
>> the result is available:
>> conn.execute("SELECT * FROM test", query_callback)
>>
>> def query_callback(row):
>>    write(row[...])
>>
>>
>> 2) Can be implemented in mod_wsgi, however my problem is that I can't
>> figure out how the application can yield some data available after a
>> callback is called.
>>
> 
> I think you figured it out already, actually:
> 

Right.

>     def app(...):
>         conn.execute("SELECT * FROM test", query_callback)
>         # indicate not-done-yet, however
> 
>     def query_callback(...):
>         write(...)
>         conn.execute("SELECT * FROM test2", another_callback)
> 
>     def another_callback(...):
>         write(...)
>         finish()
> 
> If you can have one callback, then there's no reason you shouldn't be
> able to have an arbitrary number of callbacks.
> 

The problem is not with callbacks, but how the application can yield the 
data obtained via a callback


> Of course, this could also be expressed in a less error prone manner:
> 
 > [...]

I'm thinking about a different solution, using wsgi.pause_output:

def app(env):
    def cb():
       resume()

    # This send an asynchronous query request
    r = conn.execute("SELECT * FROM test", on_state_ready=cb)
    resume = env['wsgi.pause_output']()
    yield ''

    # Now you can read the data, as if the application is synchronous
    yield str(r.fetchall())


Not sure if this is possible, I'll try to do some tests with Twisted.


Thanks and regards  Manlio Perillo


From manlio_perillo at libero.it  Tue Sep 25 11:03:25 2007
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Tue, 25 Sep 2007 11:03:25 +0200
Subject: [Web-SIG] WSGI test suite
Message-ID: <46F8CEDD.2060600@libero.it>

Hi.

I'm searching for a generic test suite for WSGI implementations.
Any suggestions?

I will use it to test my mod_wsgi implementation for nginx.


Thanks  Manlio Perillo

From manlio_perillo at libero.it  Tue Sep 25 12:54:11 2007
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Tue, 25 Sep 2007 12:54:11 +0200
Subject: [Web-SIG] start_response and error checking
Message-ID: <46F8E8D3.3050002@libero.it>

The WSGI spec says that start_response callable *must not* actually 
transmit the response headers. Instead, it must store them.

The problem is that it says nothing about errors checking.
As an example the Apache mod_wsgi implementation only checks that the 
objects is a Python List Object.

This means that I can do:

start_response('200 OK', [1, 2, 3])

with no exception being raised (the exception will only be raise when I 
attempt to write some data).

Is this the intentend behaviour?


P.S.:
I'm not sure, but it seems that Apache mod_wsgi allows status code with 
more then 3 digits, without reporting an error.
Again, is this the intented, conforming, behaviour?


Thanks and regards  Manlio Perillo

From pje at telecommunity.com  Thu Sep 27 20:23:16 2007
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu, 27 Sep 2007 14:23:16 -0400
Subject: [Web-SIG] WSGI test suite
In-Reply-To: <46F8CEDD.2060600@libero.it>
References: <46F8CEDD.2060600@libero.it>
Message-ID: <20070927182039.CC19C3A407B@sparrow.telecommunity.com>

At 11:03 AM 9/25/2007 +0200, Manlio Perillo wrote:
>Hi.
>
>I'm searching for a generic test suite for WSGI implementations.
>Any suggestions?
>
>I will use it to test my mod_wsgi implementation for nginx.

wsgiref.validate is the closest thing available to a test suite.


From pje at telecommunity.com  Thu Sep 27 20:26:18 2007
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu, 27 Sep 2007 14:26:18 -0400
Subject: [Web-SIG] start_response and error checking
In-Reply-To: <46F8E8D3.3050002@libero.it>
References: <46F8E8D3.3050002@libero.it>
Message-ID: <20070927182342.AC6AB3A407B@sparrow.telecommunity.com>

At 12:54 PM 9/25/2007 +0200, Manlio Perillo wrote:
>The WSGI spec says that start_response callable *must not* actually
>transmit the response headers. Instead, it must store them.
>
>The problem is that it says nothing about errors checking.
>As an example the Apache mod_wsgi implementation only checks that the
>objects is a Python List Object.
>
>This means that I can do:
>
>start_response('200 OK', [1, 2, 3])
>
>with no exception being raised (the exception will only be raise when I
>attempt to write some data).
>
>Is this the intentend behaviour?

No.  start_response() *should* raise an error when given the bad 
data.  This should probably be fixed in the PEP.


>P.S.:
>I'm not sure, but it seems that Apache mod_wsgi allows status code with
>more then 3 digits, without reporting an error.
>Again, is this the intented, conforming, behaviour?

No.  It should be rejected.  In general, a WSGI server *should* 
reject bad input as soon as it receives it.

All that being said, these points are "shoulds" rather than 
"musts".  A good implementation should implement them.


From graham.dumpleton at gmail.com  Fri Sep 28 01:16:41 2007
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Fri, 28 Sep 2007 09:16:41 +1000
Subject: [Web-SIG] start_response and error checking
In-Reply-To: <20070927182342.AC6AB3A407B@sparrow.telecommunity.com>
References: <46F8E8D3.3050002@libero.it>
	<20070927182342.AC6AB3A407B@sparrow.telecommunity.com>
Message-ID: <88e286470709271616w5199a5e4y810eafdb914e755f@mail.gmail.com>

On 28/09/2007, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 12:54 PM 9/25/2007 +0200, Manlio Perillo wrote:
> >The WSGI spec says that start_response callable *must not* actually
> >transmit the response headers. Instead, it must store them.
> >
> >The problem is that it says nothing about errors checking.
> >As an example the Apache mod_wsgi implementation only checks that the
> >objects is a Python List Object.
> >
> >This means that I can do:
> >
> >start_response('200 OK', [1, 2, 3])
> >
> >with no exception being raised (the exception will only be raise when I
> >attempt to write some data).
> >
> >Is this the intentend behaviour?
>
> No.  start_response() *should* raise an error when given the bad
> data.  This should probably be fixed in the PEP.
>
>
> >P.S.:
> >I'm not sure, but it seems that Apache mod_wsgi allows status code with
> >more then 3 digits, without reporting an error.
> >Again, is this the intented, conforming, behaviour?
>
> No.  It should be rejected.  In general, a WSGI server *should*
> reject bad input as soon as it receives it.
>
> All that being said, these points are "shoulds" rather than
> "musts".  A good implementation should implement them.

Except that in both cases, depending on the underlying web server it
may not be reasonable, practical or efficient to do full depth data
checking on inputs at that time. In the case of Apache mod_wsgi which
is being used as an example, checks on the validity of data is done
and an error will be raised but at a later point. It is this way in
part because Apache isn't a solution whose purpose is only to support
a WSGI application and nothing else, it has to work in with other
features provided by Apache and the mechanisms which have to be used
to converse with Apache. Thus it is more appropriate, or simpler to
delay the checks, but the checks are still done.

FWIW, mod_wsgi error checking in this is actually somewhat better than
various other solutions which don't even do type checking on the
status or values in the header tuple. Ie., various implementations do
the equivalent of:

             sys.stdout.write('Status: %s\r\n' % status)
             for header in response_headers:
                 sys.stdout.write('%s: %s\r\n' % header)

This particular example code is actually taken from the WSGI
specification itself. End result is that one could supply a malformed
status line or a header tuple which consists of non string values but
they will be converted to a string without any complaint. Such things
will be flagged with mod_wsgi.

At what point any data is rejected is I don't think the real issue as
long as it is rejected. The bigger problem is with all the
implementations which allow wrongly typed or malformed values through
as that code isn't portable and isn't being flagged as non portable.
Thus when you move the code to another implementation it may break.

Graham