[Web-SIG] CherryPy WSGI server and wsgi.input.read() with no argument.

Fri Mar 30 00:09:49 CEST 2007

Have cc'd this other to the web-sig list in case anyone wants to shoot
me down. :-)

On 30/03/07, Robert Brewer <fumanchu at amor.org> wrote:
> > Robert, was doing some testing with CherryPy WSGI server and noted
> > that if read() is called with no arguments on wsgi.input that it just
> > seems to hang indefinitely. Is there a problem here or have I managed
> > to stuff up very simple test. It works okay when I explicitly specific
> > content length.
>
> That's right. We simply hand the (blocking, makefiled) socket to the app
> as wsgi.input. PEP 333 says:
>
>     "The server is not required to read past the client's
>     specified Content-Length, and is allowed to simulate
>     an end-of-file condition if the application attempts
>     to read past that point. The application should not
>     attempt to read more data than is specified by the
>     CONTENT_LENGTH variable."
>
> We chose to not simulate the EOF, requiring app authors do that for
> themselves (mostly to give apps more flexibility). Note that the app
> side of CherryPy handles this for you by default. But since the spec
> clearly places the responsibility or checking content-length on the
> application side, it seemed redundant to perform the check both on the
> app side and the server side.

As I believe I have pointed out on the Python web-sig list before, the
statement:

""The application should not attempt to read more data than is
specified by the CONTENT_LENGTH variable."""

is actually a bit bogus.

This is because a WSGI middleware component or web server could be
acting as an input filter and decompressing a content encoding of gzip
for request. Since it knows the size will change but will not know
what the new size would be, except by buffering it all, it by rights
should remove CONTENT_LENGTH. This presents a problem for an
application as no CONTENT_LENGTH then to rely on to know whether it
has read to much input. If you leave CONTENT_LENGTH intact, it may
think it has read everything when there is in fact more.

Also, with chunked transfer encoding you will not have CONTENT_LENGTH
either. I know you read it all in and buffer it so you can calculate
it, but that prevents streaming with chunked encoding where content
length may be based on a series of end to communications.

Thus, an application should really be just ignoring CONTENT_LENGTH and
just successively calling read() in some way until it returns an empty
string. It can't really work reliably in any other way. I believe that
the WSGI adapter should be required (not just allowed) to simulate EOF
if it believes that no more input is available for that request. For
example, it knows at low level that CONTENT_LENGTH was valid because
no filtering by that point, or that in chunked encoding that null
block has been sent. The adapter is the only place it will generally
know that this is the case.

The only time that CONTENT_LENGTH may be of interest to an application
is if it is acting as a proxy to downstream web server as then it
needs to put it in downstream request. If no CONTENT_LENGTH or chunked
transfer encoding it would be forced to use chunked encoding for
downstream request.

FWIW, what I have come to the conclusion of is that read() with no
arguments is used then rather than say attempt to read all input in in
one go based on some content length, is that underneath the adapter
should insert its own size argument transparently. This size would be
based on some block size deemed to perhaps give best performance based
on technology being used. Thus read() with no arguments would always
return potentially partial data and not all data.

This is valid because semantics of read() for a file like object is
that one should call it until it returns an empty string as EOF
indicator. WSGI PEP is ambiguous in that respect as it says it is a
file like object but then says you aren't supposed to read more than
CONTENT_LENGTH and that an adapter doesn't have to simulate to EOF.
One may say that this overrides file like object properties, but the
WSGI way will not work all the time.

Graham