WSGI question: reading headers before message body has been read

Sun Jan 18 16:38:57 EST 2009

On Jan 18, 1:21 pm, Graham Dumpleton <Graham.Dumple... at gmail.com>
wrote:
> On Jan 19, 6:01 am, Ron Garret <r... at flownet.com> wrote:
>
> > I'm writing a WSGI application and I would like to check the content-
> > length header before reading the content to make sure that the content
> > is not too big in order to prevent denial-of-service attacks.  So I do
> > something like this:
>
> > def application(environ, start_response):
> >     status = "200 OK"
> >     headers = [('Content-Type', 'text/html'), ]
> >     start_response(status, headers)
> >     if int(environ['CONTENT_LENGTH'])>1000: return 'File too big'
>
> You should be returning 413 (Request Entity Too Large) error status
> for that specific case, not a 200 response.
>
> You should not be returning a string as response content as it is very
> inefficient, wrap it in an array.
>
> > But this doesn't seem to work.  If I upload a huge file it still waits
> > until the entire file has been uploaded before complaining that it's
> > too big.
>
> > Is it possible to read the HTTP headers in WSGI before the request
> > body has been read?
>
> Yes.
>
> The issue is that in order to avoid the client sending the data the
> client needs to actually make use of HTTP/1.1 headers to indicate it
> is expecting a 100-continue response before sending data. You don't
> need to handle that as Apache/mod_wsgi does it for you, but the only
> web browser I know of that supports 100-continue is Opera browser.
> Clients like curl do also support it as well though. In other words,
> if people use IE, Firefox or Safari, the request content will be sent
> regardless anyway.
>
> There is though still more to this though. First off is that if you
> are going to handle 413 errors in your own WSGI application and you
> are using mod_wsgi daemon mode, then request content is still sent by
> browser regardless, even if using Opera. This is because the act of
> transferring content across to mod_wsgi daemon process triggers return
> of 100-continue to client and so it sends data. There is a ticket for
> mod_wsgi to implement proper 100-continue support for daemon mode, but
> will be a while before that happens.
>
> Rather than have WSGI application handle 413 error cases, you are
> better off letting Apache/mod_wsgi handle it for you. To do that all
> you need to do is use the Apache 'LimitRequestBody' directive. This
> will check the content length for you and send 413 response without
> the WSGI application even being called. When using daemon mode, this
> is done in Apache child worker processes and for 100-continue case
> data will not be read at all and can avoid client sending it if using
> Opera.
>
> Only caveat on that is the currently available mod_wsgi has a bug in
> it such that 100-continue requests not always working for daemon mode.
> You need to apply fix in:
>
>  http://code.google.com/p/modwsgi/issues/detail?id=121
>
> For details on LimitRequestBody directive see:
>
>  http://httpd.apache.org/docs/2.2/mod/core.html#limitrequestbody
>
> Graham

Thanks for the detailed response!

rg