[New-bugs-announce] [issue24764] cgi.FieldStorage can't parse multipart part headers with Content-Length and no filename in Content-Disposition

Peter Landry report at bugs.python.org
Fri Jul 31 18:07:21 CEST 2015

New submission from Peter Landry:

`cgi.FieldStorage` can't parse a multipart with a `Content-Length` header set on a part:

```Python 3.4.3 (default, May 22 2015, 15:35:46)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.49)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cgi
>>> from io import BytesIO
>>> BOUNDARY = "JfISa01"
>>> POSTDATA = """--JfISa01
... Content-Disposition: form-data; name="submit-name"
... Content-Length: 5
... Larry
... --JfISa01"""
>>> env = {
...     'CONTENT_TYPE': 'multipart/form-data; boundary={}'.format(BOUNDARY),
...     'CONTENT_LENGTH': str(len(POSTDATA))}
>>> fp = BytesIO(POSTDATA.encode('latin-1'))
>>> fs = cgi.FieldStorage(fp, environ=env, encoding="latin-1")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 571, in __init__
    self.read_multi(environ, keep_blank_values, strict_parsing)
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 726, in read_multi
    self.encoding, self.errors)
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 573, in __init__
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 736, in read_single
  File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 758, in read_binary
TypeError: must be str, not bytes

This happens because of a mismatch between the code that creates a temp file to write to and the code that chooses to read in binary mode or not:

* the presence of `filename` in the `Content-Disposition` header triggers creation of a binary mode file
* the present of a `Content-Length` header for the part triggers a binary read

When `Content-Length` is present but `filename` is absent, `bytes` are written to the non-binary temp file, causing the error above.

I've reviewed the relevant RFCs, and I'm not really sure what the correct way to handle this is. I don't believe `Content-Length` is addressed for part bodies in the MIME spec[0], and HTTP has its own semantics[1].

At the very least, I think this behavior is confusing and unexpected. Some libraries, like Retrofit[2], will by default include `Content-Length`, and break when submitting POST data to a python server.

I've made an attempt to work in the way I'd expect, and attached a patch, but I'm really not sure if it's the proper decision. My patch kind of naively accepts the existing semantics of `Content-Length` that presume bytes, and treats the creation of a non-binary file as the "bug".

[0]: http://www.ietf.org/rfc/rfc2045.txt
[1]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4
[2]: http://square.github.io/retrofit/

components: Library (Lib)
files: cgi_multipart.patch
keywords: patch
messages: 247751
nosy: Peter Landry, haypo
priority: normal
severity: normal
status: open
title: cgi.FieldStorage can't parse multipart part headers with Content-Length and no filename in Content-Disposition
versions: Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6
Added file: http://bugs.python.org/file40084/cgi_multipart.patch

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list