<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Wed, 15 Feb 2017 at 08:14 Ben Hoyt <<a href="mailto:benhoyt@gmail.com">benhoyt@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg">I posted this on StackOverflow [1], but I'm posting it here as well, as I believe this is a bug (or at least quirk) in cgi.FieldStorage where you can't access a file upload properly if "filename=" is not present in the MIME part's Content-Disposition header. There are a couple of related bugs open (and closed) on bugs.python.ord, but not quite this issue.<div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">Is it legitimate for cgi.FieldStorage to use the presence of "filename=" to determine "this is a binary file" (in which case this is not a bug and my client is just buggy), or is this a bug? I lean towards the latter as the spec indicates that the filename is optional [2].</div></div></blockquote><div><br></div><div>Assuming this isn't a recent change in semantics I would say this is now a quick considering how old the module is and people probably rely on its current semantics.</div><div><br></div><div>-Brett</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">Copying from my StackOverflow question, including a test/repro case:<div class="gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg"><div class="gmail_msg">When I use `cgi.FieldStorage` to parse a `multipart/form-data` request (or any web framework like Pyramid which uses `cgi.FieldStorage`) I have trouble processing file uploads from certain clients which don't provide a `filename=file.ext` in the part's `Content-Disposition` header.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">If the `filename=` option is missing, `FieldStorage()` tries to decode the contents of the file as UTF-8 and return a string. And obviously many files are binary and not UTF-8 and as such give bogus results.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">For example:</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg"> >>> import cgi</div><div class="gmail_msg"> >>> import io</div><div class="gmail_msg"> >>> body = (b'--KQNTvuH-itP09uVKjjZiegh7\r\n' +</div><div class="gmail_msg"> ... b'Content-Disposition: form-data; name=payload\r\n\r\n' +</div><div class="gmail_msg"> ... b'\xff\xd8\xff\xe0\x00\x10JFIF')</div><div class="gmail_msg"> >>> env = {</div><div class="gmail_msg"> ... 'REQUEST_METHOD': 'POST',</div><div class="gmail_msg"> ... 'CONTENT_TYPE': 'multipart/form-data; boundary=KQNTvuH-itP09uVKjjZiegh7',</div><div class="gmail_msg"> ... 'CONTENT_LENGTH': len(body),</div><div class="gmail_msg"> ... }</div><div class="gmail_msg"> >>> fs = cgi.FieldStorage(fp=io.BytesIO(body), environ=env)</div><div class="gmail_msg"> >>> (fs['payload'].filename, fs['payload'].file.read())</div><div class="gmail_msg"> (None, '����\x00\x10JFIF')</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">Browsers, and *most* HTTP libraries do include the `filename=` option for file uploads, but I'm currently dealing with a client that doesn't (and omitting the `filename` does seem to be valid according to the spec).</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">Currently I'm using a pretty hacky workaround by subclassing `FieldStorage` and replacing the relevant `Content-Disposition` header with one that does have the filename:</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg"> import cgi</div><div class="gmail_msg"> import os</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg"> class FileFieldStorage(cgi.FieldStorage):</div><div class="gmail_msg"> """To use, subclass FileFieldStorage and override _file_fields with a tuple</div><div class="gmail_msg"> of the names of the file field(s). You can also override _file_name with</div><div class="gmail_msg"> the filename to add.</div><div class="gmail_msg"> """</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg"> _file_fields = ()</div><div class="gmail_msg"> _file_name = 'file_name'</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg"> def __init__(self, fp=None, headers=None, outerboundary=b'',</div><div class="gmail_msg"> environ=os.environ, keep_blank_values=0, strict_parsing=0,</div><div class="gmail_msg"> limit=None, encoding='utf-8', errors='replace'):</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg"> if self._file_fields and headers and headers.get('content-disposition'):</div><div class="gmail_msg"> content_disposition = headers['content-disposition']</div><div class="gmail_msg"> key, pdict = cgi.parse_header(content_disposition)</div><div class="gmail_msg"> if (key == 'form-data' and pdict.get('name') in self._file_fields and</div><div class="gmail_msg"> 'filename' not in pdict):</div><div class="gmail_msg"> del headers['content-disposition']</div><div class="gmail_msg"> quoted_file_name = self._file_name.replace('"', '\\"')</div><div class="gmail_msg"> headers['content-disposition'] = '{}; filename="{}"'.format(</div><div class="gmail_msg"> content_disposition, quoted_file_name)</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg"> super().__init__(fp=fp, headers=headers, outerboundary=outerboundary,</div><div class="gmail_msg"> environ=environ, keep_blank_values=keep_blank_values,</div><div class="gmail_msg"> strict_parsing=strict_parsing, limit=limit,</div><div class="gmail_msg"> encoding=encoding, errors=errors)</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">Using the `body` and `env` in my first test, this works now:</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg"> >>> class TestFieldStorage(FileFieldStorage):</div><div class="gmail_msg"> ... _file_fields = ('payload',)</div><div class="gmail_msg"> >>> fs = TestFieldStorage(fp=io.BytesIO(body), environ=env)</div><div class="gmail_msg"> >>> (fs['payload'].filename, fs['payload'].file.read())</div><div class="gmail_msg"> ('file_name', b'\xff\xd8\xff\xe0\x00\x10JFIF')</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">Is there some way to avoid this hack and tell `FieldStorage` not to decode as UTF-8? It would be nice if you could provide `encoding=None` or something, but it doesn't look like it supports that.</div></div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">Thanks,</div><div class="gmail_msg">Ben.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">[1] <a href="https://stackoverflow.com/questions/42213318/cgi-fieldstorage-with-multipart-form-data-tries-to-decode-binary-file-as-utf-8-e" class="gmail_msg" target="_blank">https://stackoverflow.com/questions/42213318/cgi-fieldstorage-with-multipart-form-data-tries-to-decode-binary-file-as-utf-8-e</a></div><div class="gmail_msg">[2] <a href="https://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.5.1" class="gmail_msg" target="_blank">https://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.5.1</a></div><div class="gmail_msg"><br class="gmail_msg"></div></div></div></div></div>
_______________________________________________<br class="gmail_msg">
Python-Dev mailing list<br class="gmail_msg">
<a href="mailto:Python-Dev@python.org" class="gmail_msg" target="_blank">Python-Dev@python.org</a><br class="gmail_msg">
<a href="https://mail.python.org/mailman/listinfo/python-dev" rel="noreferrer" class="gmail_msg" target="_blank">https://mail.python.org/mailman/listinfo/python-dev</a><br class="gmail_msg">
Unsubscribe: <a href="https://mail.python.org/mailman/options/python-dev/brett%40python.org" rel="noreferrer" class="gmail_msg" target="_blank">https://mail.python.org/mailman/options/python-dev/brett%40python.org</a><br class="gmail_msg">
</blockquote></div></div>