Decompressing a file retrieved by URL seems too complex

Thomas Jollans thomas at jollybox.de
Thu Aug 12 17:40:27 EDT 2010


On Thursday 12 August 2010, it occurred to John Nagle to exclaim:
> (Repost with better indentation)

Good, good.

> 
> def readurl(url) :
>      if url.endswith(".gz") :

The file name could be anything. You should be checking the reponse Content-
Type header -- that's what it's for.

>          nd = urllib2.urlopen(url,timeout=TIMEOUTSECS)
>          td1 = tempfile.TemporaryFile()	# compressed file

You can keep the whole thing in memory by using StringIO.

>          td1.write(nd.read())	# fetch and copy file

You're reading the entire fire into memory anyway ;-)

>          nd.close() # done with network
>          td2 = tempfile.TemporaryFile()	# decompressed file

Okay, maybe there is somthing missing from GzipFile -- but still you could use 
StringIO again, I expect.

> Nor is the output descriptor from gzip general; it fails
> on "readline", but accepts "read".

>>> from gzip import GzipFile
>>> GzipFile.readline
<unbound method GzipFile.readline>
>>> GzipFile.readlines
<unbound method GzipFile.readlines>
>>> GzipFile.__iter__
<unbound method GzipFile.__iter__>
>>> 

What exactly is it that's failing, and how?


>          td1.seek(0) # rewind
>          gd = gzip.GzipFile(fileobj=td1, mode="rb") # wrap unzip
>          td2.write(gd.read()) # decompress file
>          td1.close() # done with compressed copy
>          td2.seek(0) # rewind
>          return(td2) # return file object for compressed object
>      else :
>          return(urllib2.urlopen(url,timeout=TIMEOUTSECS))



More information about the Python-list mailing list