really slow gzip decompress, why?

Diez B. Roggisch deets at nospam.web.de
Mon Jan 26 10:47:03 EST 2009


redbaron wrote:

> I've one big (6.9 Gb) .gz file with text inside it.
> zcat bigfile.gz > /dev/null does the job in 4 minutes 50 seconds
> 
> python code have been doing the same job for 25 minutes and still
> doesn't finish =( the code is simpliest I could ever imagine:
> 
> def main():
>   fh = gzip.open(sys.argv[1])
>   all(fh)
> 
> As far as I understand most of the time it executes C code, so pythons
> no overhead should be noticible. Why is it so slow?

I'm guessing here - but if gzip streams (and AFAIK it does), the commandline
will simply stream to /dev/null.

OTOH, python is not streaming, it will instead allocate buffers for the
whole file. Which for a *zipped* 6.9Gb file might take a while.

Diez



More information about the Python-list mailing list