<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

<HTML>

<HEAD>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">

<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7654.12">

<TITLE>py3k, cgi, email, and form-data</TITLE>

</HEAD>

<BODY>

<!-- Converted from text/plain format -->


<P><FONT SIZE=2>There's a major change in functionality in the cgi module between Python<BR>

2 and Python 3 which I've just run across: the behavior of<BR>

FieldStorage.read_multi, specifically when an HTTP app accepts a file<BR>

upload within a multipart/form-data payload.<BR>

<BR>

In Python 2, each part would be read in sequence within its own<BR>

FieldStorage instance. This allowed file uploads to be shunted to a<BR>

TemporaryFile (via make_file) as needed:<BR>

<BR>

&nbsp;&nbsp;&nbsp; klass = self.FieldStorageClass or self.__class__<BR>

&nbsp;&nbsp;&nbsp; part = klass(self.fp, {}, ib,<BR>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; environ, keep_blank_values, strict_parsing)<BR>

&nbsp;&nbsp;&nbsp; # Throw first part away<BR>

&nbsp;&nbsp;&nbsp; while not part.done:<BR>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; headers = rfc822.Message(self.fp)<BR>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; part = klass(self.fp, headers, ib,<BR>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; environ, keep_blank_values, strict_parsing)<BR>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; self.list.append(part)<BR>

<BR>

In Python 3 (svn revision 72466), the whole request body is read into<BR>

memory first via fp.read(), and then broken into separate parts in a<BR>

second step:<BR>

<BR>

&nbsp;&nbsp;&nbsp; klass = self.FieldStorageClass or self.__class__<BR>

&nbsp;&nbsp;&nbsp; parser = email.parser.FeedParser()<BR>

&nbsp;&nbsp;&nbsp; # Create bogus content-type header for proper multipart parsing<BR>

&nbsp;&nbsp;&nbsp; parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib))<BR>

&nbsp;&nbsp;&nbsp; parser.feed(self.fp.read())<BR>

&nbsp;&nbsp;&nbsp; full_msg = parser.close()<BR>

&nbsp;&nbsp;&nbsp; # Get subparts<BR>

&nbsp;&nbsp;&nbsp; msgs = full_msg.get_payload()<BR>

&nbsp;&nbsp;&nbsp; for msg in msgs:<BR>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; fp = StringIO(msg.get_payload())<BR>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; part = klass(fp, msg, ib, environ, keep_blank_values,<BR>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; strict_parsing)<BR>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; self.list.append(part)<BR>

<BR>

This makes the cgi module in Python 3 somewhat crippled for handling<BR>

multipart/form-data file uploads of any significant size (and since<BR>

the client is the one determining the size, opens a server up for an<BR>

unexpected Denial of Service vector).<BR>

<BR>

I *think* the FeedParser is designed to accept incremental writes,<BR>

but I haven't yet found a way to do any kind of incremental reads<BR>

from it in order to shunt the fp.read out to a tempfile again.<BR>

I'm secretly hoping Barry has a one-liner fix for this. ;)<BR>

<BR>

<BR>

Robert Brewer<BR>

fumanchu@aminus.org</FONT>

</P>


</BODY>

</HTML>