[Web-SIG] py3k, cgi, email, and form-data

Graham Dumpleton graham.dumpleton at gmail.com
Wed May 13 04:33:02 CEST 2009


2009/5/12 Robert Brewer <fumanchu at aminus.org>:
> There's a major change in functionality in the cgi module between Python
> 2 and Python 3 which I've just run across: the behavior of
> FieldStorage.read_multi, specifically when an HTTP app accepts a file
> upload within a multipart/form-data payload.
>
> In Python 2, each part would be read in sequence within its own
> FieldStorage instance. This allowed file uploads to be shunted to a
> TemporaryFile (via make_file) as needed:
>
>     klass = self.FieldStorageClass or self.__class__
>     part = klass(self.fp, {}, ib,
>                  environ, keep_blank_values, strict_parsing)
>     # Throw first part away
>     while not part.done:
>         headers = rfc822.Message(self.fp)
>         part = klass(self.fp, headers, ib,
>                      environ, keep_blank_values, strict_parsing)
>         self.list.append(part)
>
> In Python 3 (svn revision 72466), the whole request body is read into
> memory first via fp.read(), and then broken into separate parts in a
> second step:
>
>     klass = self.FieldStorageClass or self.__class__
>     parser = email.parser.FeedParser()
>     # Create bogus content-type header for proper multipart parsing
>     parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib))
>     parser.feed(self.fp.read())
>     full_msg = parser.close()
>     # Get subparts
>     msgs = full_msg.get_payload()
>     for msg in msgs:
>         fp = StringIO(msg.get_payload())
>         part = klass(fp, msg, ib, environ, keep_blank_values,
>                      strict_parsing)
>         self.list.append(part)
>
> This makes the cgi module in Python 3 somewhat crippled for handling
> multipart/form-data file uploads of any significant size (and since
> the client is the one determining the size, opens a server up for an
> unexpected Denial of Service vector).
>
> I *think* the FeedParser is designed to accept incremental writes,
> but I haven't yet found a way to do any kind of incremental reads
> from it in order to shunt the fp.read out to a tempfile again.
> I'm secretly hoping Barry has a one-liner fix for this. ;)

FWIW, Werkzeug gave up on 'cgi' module for form passing and implements its own.

Not sure whether this issue in Python 3.0 was one of the reasons or
not. I know one of the reasons was because cgi.FieldStorage is not
WSGI 1.0 compliant. One of the main reasons that no one actually
adheres to WSGI 1.0 is because of the 'cgi' module. This still hasn't
been addressed by a proper amendment to WSGI 1.0 specification or a
new WSGI 1.1 specification to allow a hint to readline().

The Werkzeug form processing module is properly WSGI 1.0 compliant,
meaning that Wekzeug is possibly the only major WSGI framework to be
WSGI compliant.

Graham


More information about the Web-SIG mailing list