py3k, cgi, email, and form-data
There's a major change in functionality in the cgi module between Python 2 and Python 3 which I've just run across: the behavior of FieldStorage.read_multi, specifically when an HTTP app accepts a file upload within a multipart/form-data payload. In Python 2, each part would be read in sequence within its own FieldStorage instance. This allowed file uploads to be shunted to a TemporaryFile (via make_file) as needed: klass = self.FieldStorageClass or self.__class__ part = klass(self.fp, {}, ib, environ, keep_blank_values, strict_parsing) # Throw first part away while not part.done: headers = rfc822.Message(self.fp) part = klass(self.fp, headers, ib, environ, keep_blank_values, strict_parsing) self.list.append(part) In Python 3 (svn revision 72466), the whole request body is read into memory first via fp.read(), and then broken into separate parts in a second step: klass = self.FieldStorageClass or self.__class__ parser = email.parser.FeedParser() # Create bogus content-type header for proper multipart parsing parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib)) parser.feed(self.fp.read()) full_msg = parser.close() # Get subparts msgs = full_msg.get_payload() for msg in msgs: fp = StringIO(msg.get_payload()) part = klass(fp, msg, ib, environ, keep_blank_values, strict_parsing) self.list.append(part) This makes the cgi module in Python 3 somewhat crippled for handling multipart/form-data file uploads of any significant size (and since the client is the one determining the size, opens a server up for an unexpected Denial of Service vector). I *think* the FeedParser is designed to accept incremental writes, but I haven't yet found a way to do any kind of incremental reads from it in order to shunt the fp.read out to a tempfile again. I'm secretly hoping Barry has a one-liner fix for this. ;) Robert Brewer fumanchu@aminus.org
Robert Brewer wrote:
There's a major change in functionality in the cgi module between Python 2 and Python 3 which I've just run across: the behavior of FieldStorage.read_multi, specifically when an HTTP app accepts a file upload within a multipart/form-data payload.
In Python 2, each part would be read in sequence within its own FieldStorage instance. This allowed file uploads to be shunted to a TemporaryFile (via make_file) as needed:
klass = self.FieldStorageClass or self.__class__ part = klass(self.fp, {}, ib, environ, keep_blank_values, strict_parsing) # Throw first part away while not part.done: headers = rfc822.Message(self.fp) part = klass(self.fp, headers, ib, environ, keep_blank_values, strict_parsing) self.list.append(part)
In Python 3 (svn revision 72466), the whole request body is read into memory first via fp.read(), and then broken into separate parts in a second step:
klass = self.FieldStorageClass or self.__class__ parser = email.parser.FeedParser() # Create bogus content-type header for proper multipart parsing parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib)) parser.feed(self.fp.read()) full_msg = parser.close() # Get subparts msgs = full_msg.get_payload() for msg in msgs: fp = StringIO(msg.get_payload()) part = klass(fp, msg, ib, environ, keep_blank_values, strict_parsing) self.list.append(part)
This makes the cgi module in Python 3 somewhat crippled for handling multipart/form-data file uploads of any significant size (and since the client is the one determining the size, opens a server up for an unexpected Denial of Service vector).
I *think* the FeedParser is designed to accept incremental writes, but I haven't yet found a way to do any kind of incremental reads from it in order to shunt the fp.read out to a tempfile again. I'm secretly hoping Barry has a one-liner fix for this. ;)
It think what it needs is for the email.parser.FeedParser class to have a feed_from_file() method, supported by the class BufferedSubFile. The BufferedSubFile class keeps an internal list of lines. Perhaps it could also have a list of files, so that when the list of lines becomes empty it can continue by reading lines from the files instead, dropping a file from the list when it reaches the end, something like this: [Module feedparser.py] ... class BufferedSubFile(object): ... def __init__(self): # The last partial line pushed into this object. self._partial = '' # The list of full, pushed lines, in reverse order self._lines = [] # The list of files. self._files = [] ... ... def readline(self): while not self._lines and self._files: data = self._files[0].read(MAX_DATA_SIZE) if data: self.push(data) else: del self._files[0] if not self._lines: if self._closed: return '' return NeedMoreData ... def push_file(self, data_file): """Push some new data from a file into this object.""" self._files.append(data_file) ... and then: ... class FeedParser: ... def feed(self, data): """Push more data into the parser.""" self._input.push(data) self._call_parse() def feed_from_file(self, data_file): """Push more data from a file into the parser.""" self._input.push_file(data_file) self._call_parse() ...
2009/5/12 Robert Brewer
There's a major change in functionality in the cgi module between Python 2 and Python 3 which I've just run across: the behavior of FieldStorage.read_multi, specifically when an HTTP app accepts a file upload within a multipart/form-data payload.
In Python 2, each part would be read in sequence within its own FieldStorage instance. This allowed file uploads to be shunted to a TemporaryFile (via make_file) as needed:
klass = self.FieldStorageClass or self.__class__ part = klass(self.fp, {}, ib, environ, keep_blank_values, strict_parsing) # Throw first part away while not part.done: headers = rfc822.Message(self.fp) part = klass(self.fp, headers, ib, environ, keep_blank_values, strict_parsing) self.list.append(part)
In Python 3 (svn revision 72466), the whole request body is read into memory first via fp.read(), and then broken into separate parts in a second step:
klass = self.FieldStorageClass or self.__class__ parser = email.parser.FeedParser() # Create bogus content-type header for proper multipart parsing parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib)) parser.feed(self.fp.read()) full_msg = parser.close() # Get subparts msgs = full_msg.get_payload() for msg in msgs: fp = StringIO(msg.get_payload()) part = klass(fp, msg, ib, environ, keep_blank_values, strict_parsing) self.list.append(part)
This makes the cgi module in Python 3 somewhat crippled for handling multipart/form-data file uploads of any significant size (and since the client is the one determining the size, opens a server up for an unexpected Denial of Service vector).
I *think* the FeedParser is designed to accept incremental writes, but I haven't yet found a way to do any kind of incremental reads from it in order to shunt the fp.read out to a tempfile again. I'm secretly hoping Barry has a one-liner fix for this. ;)
FWIW, Werkzeug gave up on 'cgi' module for form passing and implements its own. Not sure whether this issue in Python 3.0 was one of the reasons or not. I know one of the reasons was because cgi.FieldStorage is not WSGI 1.0 compliant. One of the main reasons that no one actually adheres to WSGI 1.0 is because of the 'cgi' module. This still hasn't been addressed by a proper amendment to WSGI 1.0 specification or a new WSGI 1.1 specification to allow a hint to readline(). The Werkzeug form processing module is properly WSGI 1.0 compliant, meaning that Wekzeug is possibly the only major WSGI framework to be WSGI compliant. Graham
Graham Dumpleton wrote:
2009/5/12 Robert Brewer
: There's a major change in functionality in the cgi module between Python 2 and Python 3 which I've just run across: the behavior of FieldStorage.read_multi, specifically when an HTTP app accepts a file upload within a multipart/form-data payload.
In Python 2, each part would be read in sequence within its own FieldStorage instance. This allowed file uploads to be shunted to a TemporaryFile (via make_file) as needed:
klass = self.FieldStorageClass or self.__class__ part = klass(self.fp, {}, ib, environ, keep_blank_values, strict_parsing) # Throw first part away while not part.done: headers = rfc822.Message(self.fp) part = klass(self.fp, headers, ib, environ, keep_blank_values, strict_parsing) self.list.append(part)
In Python 3 (svn revision 72466), the whole request body is read into memory first via fp.read(), and then broken into separate parts in a second step:
klass = self.FieldStorageClass or self.__class__ parser = email.parser.FeedParser() # Create bogus content-type header for proper multipart parsing parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib)) parser.feed(self.fp.read()) full_msg = parser.close() # Get subparts msgs = full_msg.get_payload() for msg in msgs: fp = StringIO(msg.get_payload()) part = klass(fp, msg, ib, environ, keep_blank_values, strict_parsing) self.list.append(part)
This makes the cgi module in Python 3 somewhat crippled for handling multipart/form-data file uploads of any significant size (and since the client is the one determining the size, opens a server up for an unexpected Denial of Service vector).
I *think* the FeedParser is designed to accept incremental writes, but I haven't yet found a way to do any kind of incremental reads from it in order to shunt the fp.read out to a tempfile again. I'm secretly hoping Barry has a one-liner fix for this. ;)
FWIW, Werkzeug gave up on 'cgi' module for form passing and implements its own.
Not sure whether this issue in Python 3.0 was one of the reasons or not. I know one of the reasons was because cgi.FieldStorage is not WSGI 1.0 compliant. One of the main reasons that no one actually adheres to WSGI 1.0 is because of the 'cgi' module. This still hasn't been addressed by a proper amendment to WSGI 1.0 specification or a new WSGI 1.1 specification to allow a hint to readline().
The Werkzeug form processing module is properly WSGI 1.0 compliant, meaning that Wekzeug is possibly the only major WSGI framework to be WSGI compliant.
FWIW, I just added a replacement for the cgi module to CherryPy over the weekend for the same reasons. It's in the python3 branch but will get backported to CherryPy 3.2 for Python 2.x. Robert Brewer fumanchu@aminus.org
participants (3)
-
Graham Dumpleton
-
MRAB
-
Robert Brewer