[Twisted-Python] reading multipart/form-data headers
Hello All, You can find a sample HTTP POST request using HTTP multipart/form-data at the end of this message. The server that handles this request is using twisted so I end up with a Request object. Is there a way I can extract the file name ("image008.jpg") from this stream? I'm looking at the source of cgi.parse_multipart() and it seems to be ignored. Best regards Burak PS: POST /put HTTP/1.1 Host: localhost:7111 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive Content-Type: multipart/form-data; boundary=---------------------------352471062160373366296932264 Content-Length: 382691 -----------------------------352471062160373366296932264 Content-Disposition: form-data; name="name" a -----------------------------352471062160373366296932264 Content-Disposition: form-data; name="version" 1 -----------------------------352471062160373366296932264 Content-Disposition: form-data; name="data"; filename="image008.jpg" Content-Type: image/jpeg (...)
On Aug 11, 2016, at 8:55 AM, Burak Arslan <burak.arslan@arskom.com.tr> wrote:
Hello All,
You can find a sample HTTP POST request using HTTP multipart/form-data at the end of this message.
The server that handles this request is using twisted so I end up with a Request object. Is there a way I can extract the file name ("image008.jpg") from this stream? I'm looking at the source of cgi.parse_multipart() and it seems to be ignored.
Sadly Twisted just calls into cgi.parse_multipart and so it is in fact ignored. You might be able to re-parse the request body (request.content.seek(0); request.content.read()) with something like <https://docs.python.org/2.7/library/email.mime.html#email.mime.multipart.MIM... <https://docs.python.org/2.7/library/email.mime.html#email.mime.multipart.MIMEMultipart>> or <https://github.com/mailgun/flanker <https://github.com/mailgun/flanker>> to extract more information about the MIME. It would definitely be better for Twisted to have more robust facilities for dealing with request inputs, particularly to be able to process large uploads as a stream rather than an individual message (and such an API for form post uploads should obviously include the content disposition filename). See <https://twistedmatrix.com/trac/ticket/288 <https://twistedmatrix.com/trac/ticket/288>> for more discussion :). Thanks for using Twisted, and sorry about this shortcoming. -g
On 11 August 2016 at 21:52, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
On Aug 11, 2016, at 8:55 AM, Burak Arslan <burak.arslan@arskom.com.tr> wrote:
Hello All,
You can find a sample HTTP POST request using HTTP multipart/form-data at the end of this message.
The server that handles this request is using twisted so I end up with a Request object. Is there a way I can extract the file name ("image008.jpg") from this stream? I'm looking at the source of cgi.parse_multipart() and it seems to be ignored.
Sadly Twisted just calls into cgi.parse_multipart and so it is in fact ignored. You might be able to re-parse the request body (request.content.seek(0); request.content.read()) with something like <https://docs.python.org/2.7/library/email.mime.html#email.mime.multipart.MIM...> or <https://github.com/mailgun/flanker> to extract more information about the MIME.
It would definitely be better for Twisted to have more robust facilities for dealing with request inputs, particularly to be able to process large uploads as a stream rather than an individual message (and such an API for form post uploads should obviously include the content disposition filename). See <https://twistedmatrix.com/trac/ticket/288> for more discussion :).
Thanks for using Twisted, and sorry about this shortcoming.
I have some coding which is doing a best effort to parse the request body in a streaming mode... but it is using a fork based on the code submitted for this ticket http://twistedmatrix.com/trac/ticket/6928 diff for the fork https://github.com/chevah/twisted/compare/6928-http-100-accept It relies on the fact that the request will call the resource.headerReceive() before the actual body is consumed. form handling code using this fork https://gist.github.com/adiroiban/7f593d6d18113aae797ad081e07f4745 It uses werkzeug.http.parse_options_header for parsing the headers If your POST requests are just a few bytes, you can just use request.content.seek(0); request.content.read() as suggested by Glyph and redirect the content to the MultiPartFormData protocol For my project I need to handle files larger than 5GB, so I ended up with the modified request/resource Good luck! -- Adi Roiban
Hey Adi, hey Glyph, Thanks a lot for your answers. On 08/12/16 12:28, Adi Roiban wrote:
On 11 August 2016 at 21:52, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
Thanks for using Twisted, and sorry about this shortcoming.
Nothing to be sorry about, twisted is made of man-years of good work. Once you twist your point of view enough, it becomes quite elegant and predictable. Some wars *had* to be fought uphill while integrating with twisted's HTTP implementation but it's the way things are, I'm not complaining.
I have some coding which is doing a best effort to parse the request body in a streaming mode... but it is using a fork based on the code submitted for this ticket http://twistedmatrix.com/trac/ticket/6928
This is for a library I'm developing, so I can only depend on what's already released. However, thanks for the code -- it will certainly give me ideas.
For my project I need to handle files larger than 5GB, so I ended up with the modified request/resource
As I said, I can't make any assumptions on file size but I *think* I can pretend that my requests fit in memory as long as I keep them in memory-mapped files. mmap is wonderful -- It's both a file and a string! With that assumption, this is what I came up with: https://github.com/plq/spyne/blob/7f52ab0f11773535c6a73702b4b838b49ecdd9e6/s... I'd love to hear your feedback about it. Do you think I can get away with relying on mmap here? Best regards, Burak
participants (3)
-
Adi Roiban -
Burak Arslan -
Glyph Lefkowitz