[Twisted-Python] Asynchronous gzipped content decompression: best approach
Hi, I have written a small utility function to replace "twisted.web.client.getPage", to be able to read the response header. I have to say that the ever improving documentation made it quite easy for me to do it using the new twisted.web.client.Agent, so well done to all! Since my wrapper works quite well, I decided to add gzip response support, as it's another feature lacking from the original getPage. Again, it was quite simple and it looks it works quite well, in proof of concept scenario. Then it came my dilemma. What I'm doing now is a synchronous decompression as shown below: compressedstream = StringIO.StringIO(inzip) gzipper = gzip.GzipFile(fileobj=compressedstream) _data = gzipper.read() return _data This works quite well, but I wanted to add support for arbitrary large compressed responses, and I wanted to ask your opinion on the best approach for this: -a separate thread? it has it's limit, as it's not scaling well at all, but, in the possible scenario of a getPage usege shouldn't be a big issue (i.e. not many concurrent calls) -a Producer/consumer? That sounded like the modern twisted way of doing it, but I didn't manage to be able to implement it properly, as I could create a proper "consumer" class by looking to the example in the documentation... -twisted.python.zipstream.DeflatedZipFileEntry? I found this and seemed a potential way of doing it, with may be the use of inline generators? But then, I thought, is it a too complex approach for a simple problem? I guess that decompressing data in twisted should be a fairly common task, but I have not found a sample that looked like the "best" way for doing it, so... here is this email Thanks for you help, and I'll be happy to post the final code for future reference if anyone is interested Michele
On Fri, 2010-07-30 at 11:28 +0100, Michele - wrote:
Hi,
I have written a small utility function to replace "twisted.web.client.getPage", to be able to read the response header.
I have to say that the ever improving documentation made it quite easy for me to do it using the new twisted.web.client.Agent, so well done to all!
Since my wrapper works quite well, I decided to add gzip response support, as it's another feature lacking from the original getPage. Again, it was quite simple and it looks it works quite well, in proof of concept scenario.
Then it came my dilemma. What I'm doing now is a synchronous decompression as shown below:
compressedstream = StringIO.StringIO(inzip) gzipper = gzip.GzipFile(fileobj=compressedstream) _data = gzipper.read() return _data
In the standard Agent API, streaming data is downloaded to a protocol. So a gunzipping version would do the same: you have a wrapper protocol that uncompresses data, then delivers to underlying protocol. The basic logic would require reimplementing a small part of the gzip module: first few bytes of data are gzip header, which you skip. Then, use the zlib module to decompress data as it arrives (specifically you'd want a decompression object) and deliver it to the wrapped protocol's dataReceived.
participants (2)
-
Itamar Turner-Trauring
-
Michele -