[Twisted-Python] Twisted web, giant-file POST forwarding and early bail-out.
Hi everyone, I'm working on what is just my second project using Twisted-Web, so I'm still a relative newbee on the subject. I'm working on a project that uses Twisted Web as a simple authorization proxy. All requests to my proxy contain an authorization-token and are either handled by the proxy, or are relayed to an other server. For all GET stuff and small POST stuff this is not a problem. When I want to process large POST requests however, I run into my limits of understanding how Twisted Web actually works. 1) I figured out that next to the 'process' in my request handler, I need to also overload handleContentChunk, parse the form body-parts in the first chunk myself and open a proxy connection (self.agent.request) if the authorization token checks out. 2) When it comes to appending the data received in handleContentChunk, and if needed throttling the client if the server couldn't keep up, I can't figure out how to connect handleContentChunk and my self.agent.request instance. 3) When the token does not check out, or the connection to the server fails, it remains a mystery to me how I should throw an error in such a way that it allows me to send a proper error message to the client, while not having to first accept the whole large file. That is, it seems rather silly that I would know things failed after the first POST body chunk, but would have to wait for and accept hundreds of megabytes or maybe even a few gigabytes of post data before I can notify the client that something went wrong. It seems I am either missing something blindingly obvious or Twisted Web simply isn't meant to be used this way. I hope someone can give me some directions how to make this giant-file-post forwarding and early-bail-out scenario working with Twisted Web. T.I.A. Rob Meijer
On 07/09/2013 09:04 AM, Rob Meijer wrote:
Hi everyone,
I'm working on what is just my second project using Twisted-Web, so I'm still a relative newbee on the subject.
I'm working on a project that uses Twisted Web as a simple authorization proxy. All requests to my proxy contain an authorization-token and are either handled by the proxy, or are relayed to an other server. For all GET stuff and small POST stuff this is not a problem. When I want to process large POST requests however, I run into my limits of understanding how Twisted Web actually works.
1) I figured out that next to the 'process' in my request handler, I need to also overload handleContentChunk, parse the form body-parts in the first chunk myself and open a proxy connection (self.agent.request) if the authorization token checks out.
2) When it comes to appending the data received in handleContentChunk, and if needed throttling the client if the server couldn't keep up, I can't figure out how to connect handleContentChunk and my self.agent.request instance.
You probably want to read up on the producer/consumer stuff in Twisted. In particular if you're using t.w.client.Agent, bodies in requests are supplied by an IBodyProducer. http://twistedmatrix.com/documents/current/web/howto/client.html Essentially, you need an IBodyProducer that maps to the incoming transport via request, which I guess would look something like this: class RequestProducer(object): implements(IBodyProducer) def __init__(self, request): self.req = request def startProducing(self, consumer): self.d = defer.Deferred() self.consumer = consumer return d def pauseProducing(self): self.req.transport.pauseProducing() def stopProducing(self): # FIXME: what to do here... self.req.transport.loseConnection() def finish(): self.d.callback(None) ...and you'll have code like this on the request object: def gotLength(self, length): self.bodyprod = RequestProducer(self) if length: self.bodyprod.length = length else: self.bodyprod.length = twisted.web.iweb.UNKNOWN_LENGTH self.out_req = self.agent.request( 'GET', url, headers, bodyprod ) def handleContentChunk(self, data): ... if data_to_be_forwarded: self.bodyprod.consumer.write(data) if some_done_condition: self.bodyprod.finish()
3) When the token does not check out, or the connection to the server fails, it remains a mystery to me how I should throw an error in such a way that it allows me to send a proper error message to the client, while
This is sort of a problem with HTTP. The client will probably keep sending the data. The best you can do is write an HTTP error to the transport then throw the connection away, or blackhole all future content chunks.
On Tue, Jul 9, 2013 at 10:41 AM, Phil Mayers
This is sort of a problem with HTTP. The client will probably keep sending the data.
Yes, the only way you can interrupt the client while it is sending a request is to close the connection, which means that the client will not read any error response you sent. -- mithrandi, i Ainil en-Balandor, a faer Ambar
On 07/09/2013 04:04 AM, Rob Meijer wrote:
3) When the token does not check out, or the connection to the server fails, it remains a mystery to me how I should throw an error in such a way that it allows me to send a proper error message to the client, while not having to first accept the whole large file. That is, it seems rather silly that I would know things failed after the first POST body chunk, but would have to wait for and accept hundreds of megabytes or maybe even a few gigabytes of post data before I can notify the client that something went wrong.
HTTP clients can send a "Expects: 100-continue" header (or something like that), which tells the server it should give an early rejection or acceptance before the client sends the data, in *addition* to the final response. You would still need to write some code to support this, but it is possible.
On 07/09/2013 07:30 AM, Itamar Turner-Trauring wrote:
HTTP clients can send a "Expects: 100-continue" header (or something like that), which tells the server it should give an early rejection or acceptance before the client sends the data, in *addition* to the final response. You would still need to write some code to support this, but it is possible. Although this only lets you reject based on headers, not body.
Am 09.07.2013 13:40, schrieb Itamar Turner-Trauring:
On 07/09/2013 07:30 AM, Itamar Turner-Trauring wrote:
HTTP clients can send a "Expects: 100-continue" header (or something like that), which tells the server it should give an early rejection or acceptance before the client sends the data, in *addition* to the final response. You would still need to write some code to support this, but it is possible. Although this only lets you reject based on headers, not body.
If you have a Content Size header, it works, for chunked encoding not so much. But client side support for 100-continue is spotty, at least the python stdlib httplib client mishandles 100-continue requests in an attempt to work around Microsoft IIS strangeness. Michael -- Michael Schlenker Software Architect CONTACT Software GmbH Tel.: +49 (421) 20153-80 Wiener Straße 1-3 Fax: +49 (421) 20153-41 28359 Bremen http://www.contact.de/ E-Mail: msc@contact.de Sitz der Gesellschaft: Bremen Geschäftsführer: Karl Heinz Zachries, Ralf Holtgrefe Eingetragen im Handelsregister des Amtsgerichts Bremen unter HRB 13215
On Jul 9, 2013, at 4:30 AM, Itamar Turner-Trauring
On 07/09/2013 04:04 AM, Rob Meijer wrote:
3) When the token does not check out, or the connection to the server fails, it remains a mystery to me how I should throw an error in such a way that it allows me to send a proper error message to the client, while not having to first accept the whole large file. That is, it seems rather silly that I would know things failed after the first POST body chunk, but would have to wait for and accept hundreds of megabytes or maybe even a few gigabytes of post data before I can notify the client that something went wrong.
HTTP clients can send a "Expects: 100-continue" header (or something like that), which tells the server it should give an early rejection or acceptance before the client sends the data, in *addition* to the final response. You would still need to write some code to support this, but it is possible.
FYI, it's "Expect: 100-continue" ;-). http://www.w3.org/Protocols/rfc2616/rfc2616-sec8. -glyph
participants (6)
-
Glyph
-
Itamar Turner-Trauring
-
Michael Schlenker
-
Phil Mayers
-
Rob Meijer
-
Tristan Seligmann