twisted.web2.stream.IStream.reset()?
Hi David and all, I stumbled across your twisted web2-client branch from google. I need to make a specialized HTTP load generator that uses Keep-Alives. twisted.web2.client seems the best way, so I'm trying to make this branch work for me. My load generator needs to make a bunch of sequences of authenticated POSTs, picking up a cookie along the way. I've stumbled onto a bug: the first request will return a 401Unauthorized. HTTPClientFactory.handleStatus_401 deals with this...but the postdata is gone. This code generates the new request: req = Request(request.method, request.uri, request.args, request.headers, request.stream) It looks like web2.stream.IStream is single-use - it returns a None when there's no more data. I think this - or maybe a IResettableStream subclass - needs a reset() method. I've got a patch that allowed me to proceed. I wouldn't recommend applying as-is - I only messed with the MemoryStream that I'm using, and there's no unit test. I want to send this in before I forget about it, though. Regards, Scott -- Scott Lamb <http://www.slamb.org/>
On Fri, 23 Sep 2005 14:27:41 -0700, Scott Lamb <slamb@slamb.org> wrote:
Hi David and all,
I stumbled across your twisted web2-client branch from google. I need to make a specialized HTTP load generator that uses Keep-Alives. twisted.web2.client seems the best way, so I'm trying to make this branch work for me.
My load generator needs to make a bunch of sequences of authenticated POSTs, picking up a cookie along the way.
I've stumbled onto a bug: the first request will return a 401Unauthorized. HTTPClientFactory.handleStatus_401 deals with this...but the postdata is gone.
This code generates the new request:
req = Request(request.method, request.uri, request.args, request.headers, request.stream)
It looks like web2.stream.IStream is single-use - it returns a None when there's no more data. I think this - or maybe a IResettableStream subclass - needs a reset() method.
I've got a patch that allowed me to proceed. I wouldn't recommend applying as-is - I only messed with the MemoryStream that I'm using, and there's no unit test. I want to send this in before I forget about it, though.
Many stream sources cannot be rewound. Implementing reset for IStream would require many implementations to hold their entire contents in memory. This is completely unreasonable. Creating a separate interface for streams which could be rewound would avoid this problem, but at the same time limit the actual implementations your code could work with to an extreme subset. Instead of extending IStream or creating a new interface, you probably want to wrap request.stream before passing it to the request. The wrapper can take care of providing restartability. It can also prevent the concurrency bugs that handing a single stream to multiple Requests will present. Jp
On Sep 23, 2005, at 2:37 PM, Jp Calderone wrote:
Instead of extending IStream or creating a new interface, you probably want to wrap request.stream before passing it to the request. The wrapper can take care of providing restartability. It can also prevent the concurrency bugs that handing a single stream to multiple Requests will present.
Makes sense. I'm not going to write it today, though. I might once things calm down around here (hopefully next week) unless someone beats me to it. By the way, there's something funny going on with mail headers from you -> the list -> me. The "To:" header I see is this: To: Discussion@calvin.slamb.org, of@calvin.slamb.org, twisted.web@calvin.slamb.org, Nevow@calvin.slamb.org, and Woven <twisted-web@twistedmatrix.com> I think my SMTP server appends its hostname to "To" parts without one. It probably received this: To: Discussion of twisted.web, Nevow, and Woven <twisted- web@twistedmatrix.com> instead of this: To: "Discussion of twisted.web, Nevow, and Woven" <twisted- web@twistedmatrix.com> -- Scott Lamb <http://www.slamb.org/>
On 23 Sep 2005, at 14:37, Jp Calderone wrote:
Many stream sources cannot be rewound. Implementing reset for IStream would require many implementations to hold their entire contents in memory. This is completely unreasonable. Creating a separate interface for streams which could be rewound would avoid this problem, but at the same time limit the actual implementations your code could work with to an extreme subset.
Instead of extending IStream or creating a new interface, you probably want to wrap request.stream before passing it to the request. The wrapper can take care of providing restartability. It can also prevent the concurrency bugs that handing a single stream to multiple Requests will present.
On second, thought, i don't buy this at all. You're saying that this wrapper should provide the buffering itself to provide restartability? Will the data be large / should this buffering happen in memory or on disk? For MemoryStream and FileStream, the answers differ. In both cases, the buffering is silly, though; they have all the information to reset themselves. CompoundStream, TruncaterStream, and PostTruncaterStream could all be made resettable, provided the streams that they operate on are. Only ProducerStream differs. I don't think it's unreasonable to say that caller should wrap it before feeding it to web2.client.Request. Only the caller knows whether in-memory or on-disk buffering makes more sense. Maybe neither - if it could be produced once, maybe the best thing is to produce it again. ProducerStream is already different in that it doesn't support .length. What's another optional operation? -- Scott Lamb <http://www.slamb.org/>
On Sep 24, 2005, at 12:41 PM, Scott Lamb wrote:
On second, thought, i don't buy this at all. You're saying that this wrapper should provide the buffering itself to provide restartability? Will the data be large / should this buffering happen in memory or on disk? For MemoryStream and FileStream, the answers differ. In both cases, the buffering is silly, though; they have all the information to reset themselves.
Doing any sort of automatic buffering is going to be a bad solution, because you won't be able to discard the buffer until you've got the final response from the other side, which may not happen until you've sent all 20GB of data, two days from now. If the server is well- designed, it will send an authorization denied error before you've uploaded everything, but you cannot rely on that. And you really don't really want to have the server buffer all that data for two days. Of course, you also don't want to upload it twice, but some things you've just gotta live with. So, anyhow, I think any buffering must be explicit so the developer has to explicitly ask to shoot themselves in the foot. Also, I've tried to keep the stream API and implementation relatively simple and easy to understand. Some bits of it do get complicated already, but I think adding reset will complicate it quite a bit further for any non-trivial stream. So, what do you do? Probably it'd be best to pass something from which you can get a stream, rather than a stream itself. Thus if you need to get the stream again, you can call the function again. No muss, no fuss. James
participants (3)
-
James Y Knight
-
Jp Calderone
-
Scott Lamb