Decompressing a file retrieved by URL seems too complex

John Nagle nagle at animats.com
Thu Aug 12 16:40:14 EDT 2010


(Repost with better indentation)
    I'm reading a URL which is a .gz file, and decompressing
it.  This works, but it seems far too complex.  Yet
none of the "wrapping" you might expect to work
actually does.  You can't wrap a GzipFile around
an HTTP connection, because GzipFile, reasonably enough,
needs random access, and tries to do "seek" and "tell".
Nor is the output descriptor from gzip general; it fails
on "readline", but accepts "read". (No good reason
for that.) So I had to make a second copy.

				John Nagle

def readurl(url) :
     if url.endswith(".gz") :					
         nd = urllib2.urlopen(url,timeout=TIMEOUTSECS)	
         td1 = tempfile.TemporaryFile()	# compressed file
         td1.write(nd.read())	# fetch and copy file
         nd.close() # done with network
         td2 = tempfile.TemporaryFile()	# decompressed file
         td1.seek(0) # rewind
         gd = gzip.GzipFile(fileobj=td1, mode="rb") # wrap unzip
         td2.write(gd.read()) # decompress file
         td1.close() # done with compressed copy
         td2.seek(0) # rewind
         return(td2) # return file object for compressed object
     else :
         return(urllib2.urlopen(url,timeout=TIMEOUTSECS))



More information about the Python-list mailing list